From oleg at okmij.org Tue Sep  2 01:57:18 2008
To: haskell@haskell.org
Subject: The final view on typed sprintf and sscanf
Message-ID: <20080902075718.17EB1AE54@Adric.metnet.fnmoc.navy.mil>
Date: Tue,  2 Sep 2008 00:57:18 -0700 (PDT)
Status: OR


It would be remiss not to mention the dual solution to the problem of
typed sprintf and sscanf sharing the same formatting
specification. The previous message defined the embedded
domain-specific language of formatting specifications in the initial
style, as a data type. The language can also be defined in the final
style. To the end user, the difference is hardly noticeable: all the
tests of the previous message work as they are (modulo a few
adjustments caused by the monomorphism restriction). However, whereas
the initial style required GADT, the final solution is entirely in
Haskell98. One often hears that hardly anything interesting can be
written in Haskell98. I submit that implementing type-indexed terms,
thought to require GADTs or similar dependent-type-like extensions,
ought to count as interesting.

Again, the formulation of the problem and the end-user interface
remain exactly the same as described in the previous message. Here are
a few examples:

> tp1 = sprintf $ lit "Hello world"
> -- "Hello world"
> ts1 = sscanf "Hello world" (lit "Hello world")  ()
> -- Just ()

> tp2 = sprintf (lit "Hello " ^ lit "world" ^ char) '!'
> -- "Hello world!"
> ts2 = sscanf "Hello world!" (lit "Hello " ^ lit "world" ^ char) id
> -- Just '!'

> fmt3 () = lit "The value of " ^ char ^ lit " is " ^ int
> tp3 = sprintf (fmt3 ()) 'x' 3
> -- "The value of x is 3"
> ts3 = sscanf "The value of x is 3" (fmt3 ()) (\c i -> (c,i))
> -- Just ('x',3)

The only difference is the dummy unit argument to the fmt3
term, to keep it polymorphic and away from the monomorphism
restrictions. Otherwise, the examples look the same and work the same.
The complete code is available at PrintScanF.hs
It is Haskell98 and should work on any Haskell98 system (tested on GHC
6.8.2 and Hugs September 2006).

Whereas the initial version defined the formatting language with the
help of GADT, the final version uses a simple, single-parameter type
class

> class FormattingSpec repr where
>     lit  :: String -> repr a a
>     int  :: repr a (Int -> a)
>     char :: repr a (Char -> a)
>     (^)  :: repr b c -> repr a b -> repr a c
> infixl 5 ^

The printer and the scanner are two interpreters of the language

> newtype FPr a b = FPr ((String -> a) -> b)
> instance FormattingSpec FPr where
>     lit str = FPr $ \k -> k str
>     char    = FPr $ \k -> \x -> k [x]
>     (FPr a) ^ (FPr b)  = FPr $ \k -> a (\sa -> b (\sb -> k (sa ++ sb)))

> newtype FSc a b = FSc (String -> b -> Maybe (a,String))
> instance FormattingSpec FSc where
>     lit str = FSc $ \inp x -> 
> 		      maybe Nothing (\inp' -> Just (x,inp')) $ prefix str inp
>     char    = FSc $ \inp f -> case inp of
> 		                 (c:inp)  -> Just (f c,inp)
> 		                 ""       -> Nothing
>     (FSc a) ^ (FSc b) = FSc $ \inp f ->
> 	  maybe Nothing (\(vb,inp') -> b inp' vb) $ a inp f

> sprintf :: FPr String b -> b
> sprintf (FPr fmt) = fmt id
> sscanf :: String -> FSc a b -> b -> Maybe a
> sscanf inp (FSc fmt) f = maybe Nothing (Just . fst) $ fmt inp f

The transformation from the initial to the final style follows the
correspondence described in
	http://okmij.org/ftp/tagless-final/tagless-typed.html#in-fin
In fact, PrintScanF.hs was `derived' from PrintScan.hs by Emacs `code
movements'. Once the syntax errors have been fixed, the code
worked on the first try.

One can easily define a yet another interpreter, to convert the
formatting specification to a C-like format string.