From oleg-at-okmij.org Thu Aug 10 21:16:34 2006
Date: 10 Aug 2006 21:16:34 -0000
Message-ID: <20060810211634.71649.qmail@eeoth.pair.com>
To: haskell@haskell.org
Subject: Smash your boiler-plate without class and Typeable
Status: RO
We describe a new generic programming approach, which is powerful
enough to traverse a term and return another term of a different type,
determined by the original term's type/structure. The example is a
generic function that replaces all Floats with Doubles in an arbitrary
term.
The approach combines the features of the original `Scratch your
boilerplate' (SYB1, presented at TLDI2003) and the third version SYB3
(ICFP05):
-- like SYB1, our generic function is an ordinary function rather
than a typeclass method. Unlike SYB3, the end user of our library
defines no typeclasses. We do not need any extensions like
recursive instances either.
-- unlike SYB1, we need neither Typeable nor (safe) typecast
-- unlike SYB1 and SYB3, we do not need higher-rank types
-- like SYB3, the library needs overlapping and undecidable
instances. Unlike SYB3, the end user needs no such extensions.
In fact, the use of these extensions is confined to a small part of
the library, namely, to the implementation of TypeEq.
The essence of our approach is the _typelevel_ typecase. It is
inspired by the deepest functor, which has
http://okmij.org/ftp/Haskell/typecast.html#deepest-functor
This message and code have greatly benefited from discussions with
Chung-chieh Shan, who also suggested the title.
Our generic function is composed of a specific and generic parts. A
specific part tells us what to do if the input term happens to be of
some particular type. The generic part tells us what to do with all
other terms. The specific part is just an HList of ordinary functions,
with different argument types. If the type of the input term matches
the argument type of one of the functions in the list, we apply that
function. If no specific function applies, we do the generic action,
implicit in the traversal strategy (for example, apply the generic
function to the subterms and reduce the results).
Generic programming described in the original SYB1 paper introduced
two fundamental ideas:
-- generic functions like |gmapQl op seed f term|, which
traverses an arbitrary term and applies the user-defined function
f to each subterm and reduces the results with a left-associative
operator op.
-- generic function f that transforms arbitrary (sub)terms of
arbitrarily types. That transformation is specified as a composition
of specific transformations (which apply only to the values of some
specific types) and a generic, default transformation for all other
types.
In SYB1, the function gmapQl is implemented as a method of a class
Data; the SYB library provides instances for all (interesting)
types. The function 'f', OTH, is essentially a `typecase' operator, to
tell one specific type from the rest. SYB relies on a run-time
typecase, depending on the run-time type representation -- which is
provided by the typeclass Typeable. The typeclass also has the
method `cast' for a safe cast from the value of `generic type' to the
value of a specific type.
We observe that the typecase, _on the type level_, has always existed
in Haskell (although that fact was not perhaps realized until
HList). That typecase, which does not need any `cast' operation, is
the type equality predicate TypeEq.
We consider here two classes of generic functions. The first class of
generics takes a term and returns a value of some fixed type. The
example is a generic size function, counting the number of data
constructors in a term with the specific size measure assigned to
values of particular types, e.g., strings. The second class of
generics takes a term and returns a potentially different term -- in
value as well as in type. For example, replacing Floats with Doubles
or vice versa, or replacing all tuples of integers in a complex term
with a 2-element array.
Our first example uses a generic query function gmapq, which
recursively traverses a term in the depth-first order, and applies
reducer to the list of term's children. First, we define a function
that counts the number of data constructors and values of primitive
types in any term:
> gsize a = gmapq SNil (\l -> 1 + sum l) a
> test1 = gsize (1::Int) -- 1
> test2 = gsize [1::Int,2,3] -- 7 == 3 integers + three (:) + one []
> test3 = gsize "abc" -- 7, as above, with Char instead of Int
> test4 = gsize ["abc"] -- 9: one extra (:) and one extra []
Now, we shall treat strings specially: we assign each string size
999. We only need a small modification to gsize':
> gsize' a = gmapq (SCons (\ (_::String) -> (999::Int)) SNil) (succ . sum) a
> test1' = gsize' (1::Int)
> test2' = gsize' [1::Int,2,3] -- 7
> test3' = gsize' "abc" -- 999
> test4' = gsize' ["abc"] -- 1001
In a different example, we check if a given data structure
contains somewhere the letter 'a' (or an integer with the 'a' code):
> hasa a = gmapq (SCons (== 'a') (SCons (== (fromEnum 'a')) SNil)) or a
> testh = (hasa ('x',False),
> hasa ('x',97::Int), hasa 'a', hasa [[["cde"],["abc"]]])
> -- (False,True,True,True)
For generic term transformations, we take the following term:
> term1 = ([1::Int,2], (True,('2',[(3::Int,4::Int)])))
Let us increment all integers in that term:
> inci a = gtmapq (HCons (\ (x::Int) -> x + 1) HNil) a
> testi1 = inci term1
-- result: ([2,3],(True,('2',[(4,5)])))
Let us replace all Int tuples (x,y) with an array [x,y], and
negate all booleans
> p2l a = gtmapq (HCons (\ (x::Int,y) -> [x,y]) (HCons not HNil)) a
> test_p2l = p2l term1
-- result: ([1,2],(False,('2',[[3,4]])))
Let us replace an Int with a Double everywhere
> i2d a = gtmapq (HCons (fromIntegral::Int->Double) HNil) a
> test_i2d = i2d term1
-- result: ([1.0,2.0],(True,('2',[(3.0,4.0)])))
The complete implementation along with the examples is available at
http://okmij.org/ftp/Haskell/syb4.hs
The following is the key part of the implementation of gmapq. First,
we introduce a heterogeneous list of functions a->w where all
functions in the list have the same return type w. The argument type
'a' varies from one list member to another. The property that each
function in the list has the same result type is guaranteed by the
construction of the datatype.
> data SNil w = SNil
> data SCons a b w = SCons (a->w) (b w)
Given the datum of the type 'a', we check if any function in the list
'spec w' has the argument type 'a' and so can be applied. If there is
such a function, we apply it. Otherwise, we return the default,
the third argument of sapply.
> class SApply spec a where
> sapply :: spec w -> a -> w -> w
>
> instance SApply SNil a where
> sapply _ _ deflt = deflt
>
> instance (TypeEq a a' bf, SApply' bf (SCons a' r) a)
> => SApply (SCons a' r) a where
> sapply = sapply' (undefined::bf)
>
> class SApply' bf spec a where
> sapply' :: bf -> spec w -> a -> w -> w
>
> instance SApply' HTrue (SCons a r) a where
> sapply' _ (SCons p _) x _ = p x
>
> instance SApply r a => SApply' HFalse (SCons a' r) a where
> sapply' _ (SCons _ r) x deflt = sapply r x deflt