previous  next  contents  top

Data-Generic and Data-Extensible Programming in Haskell

 


 

Polymorphic variants: solving the expression problem

The existing HList records support, without further programming, polymorphic variants, i.e., extensible recursive open sum datatypes, quite similar to polymorphic variants of OCaml. Variant subtyping is automatic. HList thus solves the familiar (in OO, at least) `expression problem' -- the ability to add new alternatives to a datatype and extend old processing functions to deal with the extended variant, maximally reusing old code without changing it. Furthermore, values of unextended datatypes are processed by extended functions without any ado. In contrast, attempting to apply non-extended functions to the values of the extended datatype results in a type error.

Our polymorphic variants are literally open co-products: dual of extensible polymorphic records of HList. The duality is negation:

     NOT (A | B) -> (NOT A & NOT B)
This implication is a part of the deMorgan law that holds both in classical, and, more importantly, in intuitionistic logics. Our encoding of sums is the straightforward Curry-Howard image of that law.

Our implementation of polymorphic variants in terms of HList records uses no type classes, no type-level programming or any other type hacking. In fact, there are no type annotations, type declarations, or any other mentioning of types, except in the comments.

Version
The current version is 1.0, Jul 2, 2006.
References
VariantP.hs [5K]
The complete code with many examples. The code is based on (and included in the distribution of) the HList library.

open-coproduct.txt [4K]
Polymorphic variants (extensible, recursive sums) with HList
The message posted on the Haskell mailing list on Sun, 2 Jul 2006 18:35:12 -0700 (PDT)

< http://guppy.eng.kagawa-u.ac.jp/~kagawa/PVH >
Koji Kagawa: Polymorphic Variants in Haskell
Presented at the Haskell Workshop 2006
This paper proposes a quite different solution to the expression problem, encoding constructors of open `data types' as class methods. Our example is patterned after the running example of this paper.

 

A variation on the theme of SYB3

We describe a variation of SYB3 type-class-based generic programming framework that avoids both higher-rank types and mutually recursive instances. Because of the latter our code, unlike SYB3, works even in Hugs. The code was written back at ICFP05 in Tallinn, as a challenge from Simon Peyton-Jones after I have expressed doubts about the necessity of mutually recursive instances. I am grateful to Chung-chieh Shan for many helpful discussions.

The code files below closely follow the layout of the running example in Laemmel and Peyton-Jones' ICFP2005 talk. The comments describe which part of the code is the `Library', which part of the code is the (data-structure--independent) `generic function', and which part of the code overrides the general processing of the generic function. The code has two examples; one of them is gsize of the original SYB3 paper.

Our variation of SYB3 takes advantage of the fact that type-class-bounded and higher-rank polymorphisms can be partially traded for each other. To be more precise, we use type codes to pass around functions. The type code must then be interpreted by a type class similar to Apply of HList (cf. HMap). This extra interpretation step is the main conceptual deviation from SYB3, largely taking the important idea of function combinators out of it.

Version
The current version is 1.1, Oct 5, 2005.
References
dat1.hs [4K]
This file describes an intermediate step, so to make it easier to understand the final code below. This present code still uses recursive instances, but for a different reason and only for recursive data types such as lists. We already manage without higher-rank functions.

dat2.hs [4K]
With a little bit of CPS, we now eliminate recursive instance dependency even for recursive datatypes. This is our final solution.

Re: Scrap your boilerplate and Variants
The discussion thread started from a message posted on the Haskell-Generics mailing list on Tue Oct 24 23:28:24 EDT 2006

< http://www.cs.vu.nl/boilerplate/ >
Scrap your boilerplate ... in Haskell
That web page includes the pointer to our inspiration, SYB3: Scrap your boilerplate with class: extensible generic functions by Ralf Laemmel and Simon Peyton-Jones. In: Proceedings of ICFP 2005.

 

Smash along your boilerplate

Smash is a generic programming approach based on a type-level typecase, best understood as a static dual of `Scrap your boilerplate I' (SYB1). The Smash approach is powerful to express traversals where the type of the result is computed from the type of the transformer and the type/structure of the original term. An example is replacing all Floats with Doubles in an arbitrary term, e.g., made of Maybes, tuples, lists; the result type is computed and need not be specified.

SYB1, when traversing terms and invoking user functions for subterms of a particular type, relies on a run-time typecase. The latter requires run-time type representation, which is provided by the typeclass Typeable. The typeclass implements the method `cast' for a safe cast from the `generic type' to the specific type. We observe that the typecase, at the type level, has always existed in Haskell: it is the type equality predicate TypeEq of HList.

Our generic functions, like those of SYB1, are ordinary functions. Since the transformation for each subterm is chosen at compile-time, however, we do not need any casts, run-time type representations, or Typeable. We do not need higher-ranked types either. Also unlike SYB1, the set of traversal strategies is extensible: To the pre-defined strategies for gmap, gshow, gfold and geq, the programmer may at any time add a new class of traversals.

Our most general generic function is gapp, which applies a generic function to a term. A generic function is, quite literally, made of two parts. First is a term traversal strategy, identified by a label. One strategy may be to `reduce' a term using a supplied reducing function (cf. fold over a tree). Another strategy rebuilds a term. The second component of a generic function is spec, the HList of `exceptions', or ad hoc transformations. Each element of spec is an ordinary function that tells how to transform a term of a specific type. Exceptions override the generic traversal: If the type of the input term matches the argument type of one of the functions in spec we apply that function. If no specific function applies, we do the generic action implicit in the traversal strategy.

The algorithm that selects for each input subterm the appropriate transformer is tantamount to overloading resolution algorithm. Smash thus implements typeclass instance selection algorithm in the typechecker itself. The messages below describe the implementation of typeclasses in terms of Smash, with an additional flexibility of reordering, inspecting and deleting `instances'.

Version
The current version is 1.2, Jun 5, 2007.
References
< http://darcs.haskell.org/generics/comparison/SmashA/Syb4A.hs >
The complete commented code, of the core and of several typical traversal strategies for gmap, gfold, gshow, geq. There are examples of all these strategies. The other files in that directory are sample applications and examples, within the generic programming comparison framework by Alexey Rodriguez Yakushev, Alex Gerdes, and Johan Jeuring. The code is tested with GHC 6.4.1 and 6.6.

smash-first.txt [8K]
Smash your boiler-plate without class and Typeable
The message describing the original Smash approach, posted on the Haskell mailing list on 10 Aug 2006 21:16:34 -0000

smash-along.txt [12K]
Smash along your boilerplate; how to traverse a non-existent term
The improved and generalized approach, described in the message posted on the Haskell mailing list on Tue, 5 Jun 2007 00:09:02 -0700 (PDT)

< http://www.haskell.org/haskellwiki/Applications_and_libraries/Generic_programming/Smash >
`Smash' on Haskell Wiki

 

Generic traversal of non-existent terms

The Smash generic programming approach lets the programmer add new classes of traversals. We show one such extension, lazy gmap, and its application to mapping of terms that do not exist. This lets us implement gminimum and gmaximum -- which traverse undefined and produce the fully-defined smallest (resp. largest) term of the desired type:
     gminimum () :: (Maybe Int,Either Bool (Maybe Char))
     -- ==> (Nothing,Left False)
     gmaximum () :: (Maybe Int,Either Bool (Maybe Char))
     -- ==> (Just 2147483647,Right (Just '\1114111'))

The result of traversing (undefined::(Maybe Int,Either Bool (Maybe Char))) is (Nothing,Left False). We take advantage of the fact that undefined stands for a term of any structure and is deconstructible and traversable. Haskell's non-strictness lets us meaningfully traverse non-existing terms.

Traversing non-existing sums such as undefined::Either Int Char requires special attention. Sums give us a choice. We should report the available alternatives, and let the users make the choice of a component of the sum. The implementation of gminimum always chooses the first available alternative; gmaximum chooses the last available.

The function gminimum is akin to the de-typechecker since both convert `undefined' to `defined'. De-typechecker produces only polymorphic functions, whereas gminimum yields data, including monomorphic functions.

Version
The current version is 1.1, Jun 5, 2007.
References
< http://darcs.haskell.org/generics/comparison/SmashA/Syb4ABuild.hs >
Implementation of the lazy gmap strategy and many examples.

smash-along.txt [12K]
Smash along your boilerplate; how to traverse a non-existent term
The explanation, described in the second half of the message posted on the Haskell mailing list on Tue, 5 Jun 2007 00:09:02 -0700 (PDT)

 

Generic de-serialization

To send a datum across the network or store it in a file we generally need to convert the object (`dump it') into a stream of bytes, strings or other such platform-independent units. We also should be able to reconstruct the object from such a dumped form -- deserialize it. A serialized object should generally represent both the structure and the content of the original. One may often assume however the structure (`schema', in database parlance) to be fixed, so (de)serialization need only deal with the content.

Serialization can be written generically, with generic fold. The following Smash code says the result of serializing an object is the list of all its primitive fields.

     serialize xs = gapp (TL_red concat) primitive_fields_show xs
     primitive_fields_show =
         (\ (s::Int)    -> [show s]) :+:
         (\ (s::Float)  -> [show s]) :+:
         (\ (s::String) -> [s])      :+:
         HNil                       -- more can be added in the future
For example, for the following data type of a company and a sample term (from the famous example of SYB papers)
     data Company  = C [Dept];          data Dept     = D Name Manager [Unit]
     data Unit     = PU Employee | DU Dept
     data Employee = E Person Salary;   data Person   = P Name Address
     data Salary   = S Float;           type Manager  = Employee
     type Name     = String;            type Address  = String
     
     genCom = C [D "Research" (E (P "Laemmel" "Amsterdam") (S 8000.0)) 
                  [PU (E (P "Joost" "Amsterdam") (S 1000.0)),
                   PU (E (P "Marlow" "Cambridge") (S 2000.0))],
                 D "Strategy" (E (P "Blair" "London") (S 100000.0)) []]
serialize genCom produces the list
      ["Research","Laemmel","Amsterdam","8000.0","Joost","Amsterdam",
       "1000.0","Marlow","Cambridge","2000.0","Strategy","Blair",
      "London","100000.0"]

Hugh Perkins has posed the problem of writing a generic deserializer, which should work for terms of any structure in the domain of a generic programming library. His particular application is complex terms representing XML documents. The deserializer can indeed be written generically: in the first approximation, it is a generic monadic map over a term. It takes a term, the prototype, and the list of serialized primitive fields as a monadic state. The deserializer traverses the prototype replacing each primitive field with the corresponding value from the monadic state. For example, assuming the following list of serialized fields:

     retro = ["Metaphysics", "Kant","Koeningsberg","800.0",
              "Hume","Edinburgh","100.0",
              "Marlowe","Cambridge","200.0",
              "Ruling","Thatcher","London","50000.0"]
and genCom as the prototype, deserialize genCom retro `upgrades' the company giving us the term
      C [D "Metaphysics" (E (P "Kant" "Koeningsberg") (S 800.0)) 
         [PU (E (P "Hume" "Edinburgh") (S 100.0)),
          PU (E (P "Marlowe" "Cambridge") (S 200.0))],
         D "Ruling" (E (P "Thatcher" "London") (S 50000.0)) []]

In the discussion thread, Jeremy Gibbons has pointed out that McBride and Paterson's idiomatic traverse subsumes monadic gmap. See his and Oliveira's paper below for the detailed discussion.

Version
The current version is 1.2, Jul 3, 2007.
References
< http://darcs.haskell.org/generics/comparison/SmashA/Deserialize.hs >
Complete code for serializing a term, and deserializing it using monadic generic map.

Hugh Perkins et al. Thread on the Hs-Generics mailing list, June 26-July 3, 2007.

Jeremy Gibbons and Bruno C. d. S. Oliveira. The essence of the Iterator pattern.
< http://web.comlab.ox.ac.uk/jeremy.gibbons/publications/#iterator >



Last updated October 5, 2007

This site's top page is http://pobox.com/~oleg/ftp/

oleg-at-pobox.com or oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!

Converted from HSXML by HSXML->HTML