From oleg-at-okmij.org Sun Nov 19 16:54:34 2006 Subject: Type-class overloaded functions: second-order typeclass programming with backtracking From: oleg-at-pobox.com Message-ID: <20061120005434.2B3A9AB40@Adric.metnet.fnmoc.navy.mil> Date: Sun, 19 Nov 2006 16:54:34 -0800 (PST) To: haskell@haskell.org Status: OR We demonstrate functions polymorphic over classes of types. Each instance of such (2-polymorphic) function uses ordinary 1-polymorphic methods, to generically process values of many types, members of that 2-instance type class. The typeclass constraints are thus manipulated as first-class entities. We also show how to write typeclass instances with back-tracking: if one instance does not apply, the typechecker will chose the `next' instance -- in the precise meaning of `next'. The code only uses stable Haskell extensions (undecidable and overlapping instances) and is tested with GHC 6.4.1 - 6.8.2. Our running example is a function whose type (modulo the constraint) is approx_eq :: a -> b -> Bool It applies to values of any two types. If both arguments are of the same type and are Fractionals, the function checks if the values are the same within given tolerance, 0.5. We test the values of other numeric types for equality give or take 1. For a class of other types supporting equality, we test the values with the Eq's (==) operation. For all other types, including the cases when the two arguments are of different types, approx_eq returns False. The complete code for this article is in the file poly2.hs of this directory. The first inspiration for this message comes from generic programming. In approaches like SYB1 or Smash, the functions that process (sub)terms of specific types must be monomorphic. In SYB1, for example, that requirement comes from the fact that TypeRep only supports monomorphic types (see also a note in Data.Dynamic documentation). One would like, for example, to traverse a term and apply a polymorphic numeric function to all subterms that are members of the class Num -- without the need to monomorphise the function. The second inspiration comes from Haskell-Cafe requests for a function that does something for any Fractional and does something else for a Num and something else entirely for a value whose type is in the class Ord. The common solution to those problems is to monomorphise -- enumerate all needed types. That leads to a notable amount of boilerplate. We'd like to process all Num values, for example, with one polymorphic function -- without manually instantiating this function for each Num type. Another drawback of the common solution is the need to enumerate Num and Fractional classes again should we need another function that does something for Fractionals and something else for Nums. We show how to eliminate that redundancy. We offer a way to describe classes of types in a _concise_ way, using unions and class differences. These classes of types may be either closed or open (extensible). After that set up, we can write arbitrarily many functions overloaded over these type classes. An instance of our function for a specific type class may use polymorphic functions to generically process all members of that type class. Our functions are hence second-order polymorphic. Again, the class membership has to be described -- but only once and for all. Furthermore, we offer quite an expressive notation and the opportunity of augmenting a class at a later time. For our running example, we define the following type classes: > type Fractionals = Float :*: Double :*: HNil > type Nums = Int :*: Integer :*: AllOf Fractionals :*: HNil > type Ords = Bool :*: Char :*: AllOf Nums :*: HNil > type Eqs = AllOf (TypeCl OpenEqs) :*: AllOfBut Ords Fractionals :*: HNil > data OpenEqs > instance TypeCls OpenEqs () HTrue -- others can be added in the future In particular, Eqs specifies the class of all of Ords except Fractionals, plus the open OpenEqs. The latter currently includes (), with more types can be added at any point. We excluded Fractionals from Eqs just because we can -- and to hint that exact equality on Fractionals is unreliable. Incidentally, while the following naive definition does not type type Russel = AllOfBut () Russel :*: HNil because type synonyms can't be recursive, a more elaborated variant does work, with the expected result. Thus when we say type _class_ (rather than type _set_), we really mean it. Our implementation is a rather trivial extension of the Apply class of HList: class Apply f a r | f a -> r where apply :: f -> a -> r; apply = undefined Our 2-polymorphic functions (and their 1-polymorphic specializations) are all instances of the Apply class. They are all identified by labels (singular datatypes) for the reasons to be explained shortly. Regular typeclasses match on types; to match on a type class, we need a guard: a type function that decides the membership. We also need the ability to `backtrack' and select another instances should the guard fail. Let us start with an example: a simple 2-polymorphic function "a -> Bool" that returns True if its argument is a member of the type class Eqs defined above. It returns False for any other type. We start by introducing the label for our function: > data IsAnEq = IsAnEq Next we define a pair of matching instances. The first instance, of a typeclass > class GFN n f a pred | n f a -> pred specifies the guard for the instantiation number `n' of the 2-polymorphic function named `f' when applied to an argument of the type 'a'. The guard itself is a label, which, when Apply'ed to 'a' yields either HTrue or HFalse. Again, we shall see soon why we need so many labels. > instance GFN Z IsAnEq a (Member Eqs) > instance Apply (GFnA Z IsAnEq) a Bool where > apply _ _ = True We are saying here that the instantiation number Z of the function named IsAnEq has the guard (Member Eqs) -- which tests the class membership of the type 'a' in Eqs. Should this guard succeed, we `apply' (GFnA Z IsAnEq) to the argument to get the desired result -- in this case, True. The second instance is the catch-all, it applies when the guard (Member Eqs) fails: > -- the default instance > instance TypeCast pred Otherwise => GFN n IsAnEq a pred > instance Apply (GFnA n IsAnEq) a Bool where > apply _ _ = False where the Otherwise membership predicate always succeeds. We now write > test1 = [apply (GFn IsAnEq) (), apply (GFn IsAnEq) (1.0::Double), > apply (GFn IsAnEq) 'a'] which evaluates to [True,False,True]. Recall that we explicitly excluded Fractionals from Eqs. Suppose that we wish to extend the above function to pairs, that is, given a pair (x,y) it should return True if both 'x' and 'y' are in Eqs. The simplest way is to extend the type class Eqs accordingly, given that we specifically made it open. We choose a different approach however, to illustrate that the function IsAnEq is itself extensible, and can be recursively applied. We add the following pair of instances: > instance GFN (S Z) IsAnEq (x,y) Otherwise > instance (Apply (GFn IsAnEq) x Bool, > Apply (GFn IsAnEq) y Bool) > => Apply (GFnA (S Z) IsAnEq) (x,y) Bool where > apply (GFnA f) (x,y) = apply (GFn f) x && apply (GFn f) y We wedge this instance `between' the existing two. > test2 = [apply (GFn IsAnEq) (True,'a'), > apply (GFn IsAnEq) (1.0::Double,True)] > -- [True, False] Let us move to the main example: the approximate equality: test the Fractionals with the tolerance 0.5, test other Nums with the tolerance 1, and test Eqs with the exact equality. For the values of all other types (including the case when the values to compare have different types), the result is False. The order of guard tests obviously matters as all of Fractionals are also Nums. If we want to discriminate Fractionals, we should test for them first. > data ApproxEq = ApproxEq -- define the label > > data PairOf t -- lift membership to pairs > instance Apply t x r => Apply (PairOf t) (x,x) r > instance TypeCast r HFalse => Apply (PairOf t) x r > > instance GFN Z ApproxEq (x,x) (PairOf (Member Fractionals)) > instance (Fractional x, Ord x) => > Apply (GFnA Z ApproxEq) (x,x) Bool where > apply _ (x,y) = abs (x - y) < 0.5 > > instance GFN (S Z) ApproxEq (x,x) (PairOf (Member Nums)) > instance (Num x, Ord x) => > Apply (GFnA (S Z) ApproxEq) (x,x) Bool where > apply _ (x,y) = abs (x - y) < 2 > > instance GFN (S (S Z)) ApproxEq (x,x) (PairOf (Member Eqs)) > instance (Eq x) => > Apply (GFnA (S (S Z)) ApproxEq) (x,x) Bool where > apply _ (x,y) = x == y -- recursion over pairs skipped... > -- The default instance > instance TypeCast pred Otherwise => GFN n ApproxEq a pred > instance Apply (GFnA n ApproxEq) a Bool where > apply _ _ = False > > -- A convenient abbreviation > approx_eq x y = apply (GFn ApproxEq) (x,y) *Poly2> :t approx_eq approx_eq :: (Apply (GFn ApproxEq) (a, b) r) => a -> b -> r > test3 = [approx_eq (1.0::Double) (1.5::Double), > approx_eq (1.0::Float) (1.1::Float), > approx_eq (1::Integer) (2::Integer), > approx_eq (1::Int) True, > approx_eq (Just ()) [], > approx_eq ((2::Integer),(2::Integer)) ((1::Integer),(2::Integer)), > approx_eq ((1::Integer),(2::Integer)) ((1::Integer),(2::Integer)) ] > -- [False,True,True,False,False,True,True] Now why we need so many labels and why we have to separate the guard and the real computation across two different typeclasses? First of all, why can't we write something like the following (as in `Smash your boilerplate') and so use real functions rather than labels? approx_eq2 x y = sapply (scons (undefined::PairOf (Member Fractionals)) eqfrac (scons (undefined::PairOf (Member Nums)) eqnum (scons (undefined::PairOf (Member Eqs)) (uncurry (==)) snil))) (x,y) False where eqfrac (x,y) = abs (x - y) < 0.5 eqnum (x,y) = abs (x-y) < 2 If we do that and write > testeq2 = approx_eq2 (1.0::Double) (1.5::Double) we get a type error, `Ambiguous Eq b constraint'. Indeed, the use of (==) in the code above gives rise to an Eq b constraint. That 'b' is not directly related to the type of the arguments to approx_eq2 -- the function (==) will be applied only if the guard `PairOf (Member Eqs)' succeeds. Thus we have no direct way to instantiate the type 'b' in general, and so the constraint 'Eq b' is left hanging, eventually leading to a type error. Second, why can't we define the guard and the corresponding computation within the same instance? For example, we could have declared a typeclass > class GFN n f a pred b | n f a -> pred, n f a -> b where > gfn :: n -> f -> a -> b Instances of GFN define a 2-polymorphic function as a set of clauses. The clauses are `numbered' via the 'n' parameter, which is of a kind numeral (Z, S Z, etc). Pred is the guard, and the numbering establishes the order, so we can find the `next' instance if the guard, applied to the argument 'a', fails (that is, `Apply guard a HFalse' is derivable). Alas, to find the guard `pred', we have to select the corresponding instance. We replace a failure with the sequence of successes, selecting one instance after another, until we find the one with the succeeding guard. Those instances may come with the constraints; as we select one instance after another, the constrains accumulate (forming the conjunction). But these constraints may be contradictory. The two failing approaches show the complication we have to struggle with. A 2-polymorphic function is defined by cases where each case is, in general, a 1-polymorphic function. Each 1-polymorphic function comes with a set of constraints. We must make sure that our 2-polymorphic function makes a _disjoint_ union of those constraints rather than an intersection. Thus, unless we have committed to a particular 1-polymorphic case (the guard succeeded), we must not evaluate the corresponding constraints. Our solution has this property. We notice in the ApproxEq cases above that none of the GFN instances have any instance constraints. The corresponding Apply instances do have constraints -- which, fortunately, only take effect when the guard succeeded and the instance is selected. The name of the game is to delay the imposition of constraint. We take advantage of the fact that a label such as ApproxEq _stands_ for a set of constraints but does not have any actual constraints. We play on the difference between the notation and denotation. The need to spread the guard and the corresponding computation across two instances is less satisfying. Fortunately, the type system watches for the consistency between the two, that is, the guard admitting no more types that can be used in the real computation. It seems appropriate to end this message with 14 lines of implementation. The complete code with more examples is available at the URL given above. > newtype GFn f = GFn f > newtype GFnA n f = GFnA f > > newtype GFnTest n f flag = GFnTest f > > instance (GFN Z f a pred, Apply pred a flag, > Apply (GFnTest Z f flag) a b) > => Apply (GFn f) a b where > apply (GFn f) a = apply ((GFnTest f)::GFnTest Z f flag) a > > instance Apply (GFnA n f) a b -- guard succeeded > => Apply (GFnTest n f HTrue) a b where > apply (GFnTest f) a = apply ((GFnA f) :: GFnA n f) a > > instance (GFN (S n) f a pred, Apply pred a flag, -- else chose the next inst > Apply (GFnTest (S n) f flag) a b) > => Apply (GFnTest n f HFalse) a b where > apply (GFnTest f) a = apply ((GFnTest f)::GFnTest (S n) f flag) a