printf
was the main example
printf
and scanf
of C -- are present in many languages. With the exception of Danvy's functional unparsing and its implementation in SML/NJ, formatted input/output facilities are either type-unsafe or not applicative. Even in Haskell the type checker does not stop the programmer from passing to printf
more or fewer arguments than required by the format string. OCaml supports typed sprintf
and sscanf
but relies on a particular weird typing of these functions and on ad hoc overloading of string literals.Our goal is type-safe formatted i/o as an ordinary library of formatters, format parsers and format descriptors. Our library interface is essentially that of OCaml formatted i/o, without the weird typing. For concreteness, we will be discussing sprintf
and sscanf
. Other formatted i/o functions are similar. The function sprintf
takes as arguments a format descriptor and the values to format, returning the formatted string as the result. The types and the number of the other arguments of the function depend on the format descriptor. Conversely, sscanf
receives as arguments the input string, a format descriptor and a consumer function. The function sscanf
parses the received string according to the format descriptor and passes the extracted data to the consumer function, whose result is returned. Again, the number and the types of the arguments to the consumer function depend on the format descriptor. The function sscanf
is partial because parsing may fail. We further require format descriptors to be first-class: we should be able to pass them as arguments, return as results, and build incrementally.
The function sprintf
is polyvariadic: it accepts an
arbitrary number of arbitrarily typed arguments. Furthermore,
the first argument to sprintf
determines the number and the types of
the subsequent arguments. It seems writing such a function requires dependent
types. Olivier Danvy's surprising discovery was type-safe sprintf
written as an ordinary SML function. Hinze and Asai demonstrated more
type-safe applicative formatters. Although sscanf
takes the fixed
number of arguments, the consumer function passed as the last argument
does not have the fixed type. The number and the types of its arguments
are determined by the format descriptor. Again dependent types
seem to be called for. The fact that one can write type-safe sscanf
in any mainstream functional language is less known. Apparently it has been
unknown if type-safe sprintf
and sscanf
could share the same
format descriptor.
Ralf Hinze: Formatting: A Class Act
J. Functional Programming, 2003, v13, N5, pp. 935-944.
Kenichi Asai: On Typing Delimited Continuations: Three New Solutions to the Printf Problem
Ochanomizu U. Technical Report OCHA-IS 08-2, September 2008, 17 pp.
Higher-Order and Symbolic Computation, 2009, v22, N3, pp. 275--291.
< http://pllab.is.ocha.ac.jp/~asai/papers/tr08-2.pdf >
sprintf
and sscanf
sharing the same format descriptor. The gist of our surprisingly trivial solution is a simple embedded domain-specific language (DSL) of format descriptors. The functions sprintf
and sscanf
are two interpreters of the language, building or parsing a string according in the given format. The format descriptor, a term in our DSL, can be interpreted in far more than two ways. For example, one may write an interpreter that counts the number of integer descriptors or converts a format descriptor to a C format string. One can write sscanf
-like functions whose input comes from a file, a buffer, etc.This article uses initial tagless typed embedding of the formatting DSL into Haskell. The terms of the language are represented as GADTs. We also demonstrate that lambda-abstractions at the type level are expressible already in the Hindley-Milner type system; GADT with the inherent polymorphic recursion help us use these abstractions.
Here is the typical example of formatting and unformatting:
fmt3 = lit "The value of " ^ char ^ lit " is " ^ int tp3 = sprintf fmt3 'x' 3 -- "The value of x is 3" ts3 = sscanf "The value of x is 3" fmt3 (\c i -> (c,i)) -- Just ('x',3)A format descriptor is built by connecting primitive descriptors (such as
lit "string"
, int
, char
) with (^)
. Whereas sprintf $ lit "Hello world"
has the type String
, sprintf $ fmt3
has the type Char -> Int -> String
. Likewise, the type of the consumer of the values parsed by sscanf
varies with the format descriptor. Our sprintf
and sscanf
indeed use the same format descriptor, fmt3
, which is a first-class value.
The implementation is based on Hinze's observation
(`Formatting: a class act' JFP 2003) that the format descriptor
is a `type transformer', a functor. For example, the format
descriptor int
is associated with a functor transforming the
type t
to the type Int -> t
. Hinze represented functors in
Curry style: the functor associated with int
is then (Int ->)
.
We use a more direct, Church-style representation.
Let us consider a type F a b
where F
is a binary type constructor,
a
is a type variable, and b
is a type that may include one or more
occurrences of a
. Such F a b
may be regarded a type abstraction,
a type-level lambda-function. To apply that function to an argument,
we unify a
with the argument; the second parameter of F
becomes the result of the application. This idea comes from the
encoding of lambda-terms in Prolog as pairs (X,body)
whose first
component is a logic variable. Even the standard lambda-notation
is a fancy way of pairing a variable with the function body.
GADT.ml [15K]
The same code in OCaml (see Example 1 in the file)
PrintScanI.txt [6K]
The initial view on typed sprintf and sscanf
The message with more explanations, posted on the Haskell mailing list on Sun, 31 Aug 2008 19:40:41 -0700 (PDT)
sprintf
and sscanf
sharing the same format descriptor. Previously, we defined the embedded domain-specific language (DSL) of format descriptors in the initial style, as a data type. The language can also be defined in the final style. To the end user, the difference is hardly noticeable: all the examples of the initial solution work as they are (modulo a few adjustments caused by the monomorphism restriction). However, whereas the initial style required GADT, the final solution is entirely in Haskell98. Despite often-heard disparagement, Haskell98 is powerful enough for type-indexed terms, thought to require GADTs or similar dependent-type-like extensions.The transformation from the initial to the final style follows the correspondence referenced below. In fact, the final implementation was `derived' from the initial implementation by Emacs `code movements'. Once the syntax errors have been fixed, the code worked on the first try.
PrintScanF.txt [4K]
The final view on typed sprintf and sscanf
The message posted on the Haskell mailing list on Tue, 2 Sep 2008 00:57:18 -0700 (PDT)
Text.Printf.printf
is inspired by the message of Evan Klitzke, who wrote on Haskell-Cafe about frequent occurrences in his code of the lines likeinfoM $ printf "%s saw %s with %s" (show x) (show y) (show z)Writing
show
on and on quickly becomes tiresome. It turns out, we can avoid these repeating show
, still conforming to Haskell98.Our polyvariadic generic printf
is like polyvariadic show
with the printf
-like format string. Our printf
handles values of any present and future type for which there is a Show
instance. For example:
t1 = unR $ printf "Hi there" -- "Hi there" t2 = unR $ printf "Hi %s!" "there" -- "Hi there!" t3 = unR $ printf "The value of %s is %s" "x" 3 -- "The value of x is 3" t4 = unR $ printf "The value of %s is %s" "x" [5] -- "The value of x is [5]"The unsightly
unR
appears solely for Haskell98 compatibility: flexible instances remove the need for it. On the other hand, Evan Klitzke's code post-processes the result of formatting with infoM
, which can subsume unR
.A bigger problem with our generic printf
, shared with the original Text.Printf.printf
, is partiality: The errors like passing too many or too few arguments to printf
are caught only at run-time. We can certainly do better.
printf
that takes a C-like format string and the variable number of the subsequent arguments. Unlike C of Haskell's printf
, ours is total: if the types or the number of the subsequent arguments, the values to format, do not match the format string, a type error is reported at compile time. To the familiar format descriptors %s
and %d
we add %a
to format any showable value. The latter is like the format descriptor ~a
of Common Lisp. Likewise, we build scanf
that takes a C-like format string and the consumer function with the variable number of arguments. The types and the number of the arguments must match the format string; a type error is reported otherwise.Our approach is a variation of the safe printf
and scanf
described elsewhere on this page. We use Template Haskell to translate the format string to a phrase in the DSL of format descriptors. We use the final approach to embed that DSL into Haskell.
Unlike the safe printf
explained in the Template Haskell documentation, in our implementation, format descriptors are first class. They can be built incrementally. The same descriptor can be used both for printing and for parsing. Our printf
and scanf
are user-extensible: library users can write functions to direct format output to any suitable data sink, or to read parsed data from any suitable data source such as a string or a file. Finally, the result of formatting is parseable back using the same format descriptor.
Here are some of the tests from the test collection referenced below. The evaluation result is given in the comments below each binding. Example t31
shows that format descriptors are indeed first-class. The definition t32
, when uncommented, raises the shown type error because the format descriptor does not match the type of the corresponding argument.
t4 = let x = [9,16,25] i = 2 in sprintf $(spec "The element number %d of %a is %a") i x (x !! i) -- "The element number 2 of [9,16,25] is 25" t4s = sscanf "The element number 2 of [9,16,25] is 25" $(spec "The element number %d of %a is %a") (\i a e -> (i,a::[Int],e::Int)) -- Just (2,[9,16,25],25) -- What we print we can parse, using the same descriptor t31 = let printed = sprintf desc "x" 3 parsed = sscanf printed desc (\s i -> (s,i)) in (printed, parsed) where desc = $(spec "The value of %s is %d") -- ("The value of x is 3",Just ("x",3)) -- t32 = sprintf $(spec "The value of %s is %d") "x" True -- Couldn't match expected type `Bool' against inferred type `Int'
TotalPrintF.hs [4K]
The code to convert format strings to DSL phrases
printf
function, which converts a sequence of heterogeneous arguments to a string according to a given format. The dual problem is to implement a type-safe scanf
function, which extracts a sequence of heterogeneous arguments from a string by interpreting (Friedman & Wand LFP 1984, Friedman & Wand 2008) the same format as an equally heterogeneous sequence of patterns that binds zero or more variables. We derive multiple solutions to both problems from their formal specifications (Wand JACM 1980, TCS 1982).On one hand, our solutions show how the Hindley-Milner type system, unextended, permits accessing heterogeneous sequences with the static assurance of type safety. On the other hand, our solutions demonstrate the use of control operators (Felleisen et al. LFP 1988, Meyer & Wand 1985) to communicate with formats as coroutines (Wand LFP 1980, Haynes et al. LFP 1984).
This is joint work with Kenichi Asai and Chung-chieh Shan.
Chung-chieh Shan. Slides of the talk presented at the Symposium in honor of Mitchell Wand. August 23, 2009, Boston.
< http://www.cs.rutgers.edu/~ccshan/binding/printf-scanf-slides.pdf >
derive5.ml [26K]
An extensively commented OCaml code explaining the step-wise derivation of typed printf
and scanf
. The code derives several novel implementations of printf
that rely on multi-prompt delimited continuations.
Derive5.hs [4K]
Deriving type-safe printf
in direct style, using shift/reset with effectful typing that supports the answer-type modification and polymorphism. The code rigorously derives printf
of Asai (2009) along with an implementation that encloses in reset
the whole application of printf
to values to format.
derive6.ml [8K]
Deriving a context-passing implementation of printf
printf
is a good example to illustrate the
deep difference between genuine dependent types and their
sometimes very close emulations in OCaml, Haskell and seemingly
GCC. We will see how to derive these emulations, how we gain progressively
simpler types and what we have to sacrifice.
On the surface of it, typing the C-like formatter printf
requires
dependent types. After all, the type of printf fmt
(the number and
the types of its arguments) depends on the value of the format
string, fmt
. Listed on this page however are many implementations of
type-safe printf
in the Hindley-Milner system, OCaml and Haskell.
One way or another, first-class delimited continuation appeared,
either directly or encoded in continuation-passing style (CPS). Does
it mean that delimited continuations somehow `implement' dependent
types?
Further, OCaml has its own type system for typed formatting, refusing
to accept, for example, printf "the value of x is %d" 1.0
because 1.0
has the type float
rather than the expected int
. Even GCC
issues a warning in similar cases. Does it mean that OCaml or
GCC are dependently typed?
The answer to all these questions is negative. The C-like type-safe printf
does require dependent types, which neither OCaml nor
Haskell, let alone GCC, have. A simple test will demonstrate
this, hence distinguishing a truly dependently-typed system from a
fake. We systematically construct two archetype fake dependently-typed printf
s, tracing the progression from dependent types to fancy
non-dependent types to the Hindley-Milner system, and discussing what
burden we shift from the type checker to the user and what we lose in
expressiveness.
Recall that in C, printf
is a polyvariadic function whose first
argument is the format string, containing ordinary characters and
conversion specifications. The latter are introduced by the %
character and define how to format the corresponding printf
argument.
The number and the types of the subsequent arguments to printf
depend on the number and the content of conversion
specifications in the format string. Hence printf
has the
dependent-function type Pi x:String. PA x
, which is also written as {x:String} -> PA x
. Here x
is a value, String
is a Type (that
is, String :: *
) and PA :: String -> *
, which computes the type of printf
from the contents of the format string, has the dependent
kind. With such a printf
, we can give a type to the function
fts x = "The formatting result is " ++ sprintf ("%" ++ x) 1namely,
fts :: {x:String | isPrefix "d" x && not (elem '%' x)} -> Stringusing so-called qualified types. To avoid the irrelevant i/o, we will be silently using a version of
printf
that returns the
formatted result as a string rather than outputs it; that version is
often called sprintf
. Dependent-functions and qualified types are not
available in OCaml or Haskell. Even if Haskell let us write the fts
signature,
it could not have checked that the signature matches the code of fts
.
Although OCaml can type check simple printf
expressions as
shown earlier, OCaml cannot type check fts
:# let fts x = "The formatting result is " ^ sprintf ("%" ^^ x) 1;; ^^^ Error: Premature end of format string ``"%"''At the opposite extreme, GCC accepts the C version of
fts
with no
warnings. Just as happily GCC accepts the invocations of fts
that lead
to segmentation faults. None of the mainstream languages admit fts
with a sound type, statically ensuring crash-free applications
of the function.
The example fts
confirmed that the C-like type-safe printf
requires dependent types. There are however approximations, also type
safe, that do not impose such requirement. We now design two
characteristic approximations. With dependently-typed sprintf
, sprintf "Hello"
has the
type String
and sprintf "The value of %c is %d"
has the different
type Char -> Int -> String
. And yet in both cases the format
descriptor had the same type String
. Therefore, the type checker has
to parse and reason about the content of that string, its value. In
fact, the OCaml error message about fts
above betrays type checker's
actual parsing of the format string. Alas, OCaml lacks the type system
to express the result of parsing "%"
.
What if we parse the format string ourselves, separating out the literal
text and explicating conversion specifications? Instead of "The value of %c is %d"
we would give printf
as the format descriptor
the following parsed form:
fmtG = LIT "The value of " :^ CHAR :^ LIT " is " :^ INTEach component,
LIT str
, CHAR
, INT
, etc. can then be annotated with a
distinct type; the connector (:^)
will combine the annotations:data Desc a where LIT :: String -> Desc () INT :: Desc Int CHAR :: Desc Char (:^) :: Desc a -> Desc b -> Desc (a,b)Since each data constructor builds
Desc a
of a different type,
GADTs are called for. The inferred type fmtG :: Desc ((((), Char), ()), Int)
is a concise summary of fmtG
, describing the contained conversion specifications and their
sequence. The type abstracts over the content of LIT
descriptors, which
does not affect the type of printf
. The type of fmtG
contains just the right amount of information to compute the
type of printf
. No longer the type checker has to parse format strings.
It can work entirely from the type annotations:printf :: Desc a -> PS a type PS a = PFormat a String type family PFormat a w :: * type instance PFormat () w = w type instance PFormat Int w = Int -> w type instance PFormat Char w = Char -> w type instance PFormat (a,b) w = PFormat a (PFormat b w)Here
PS :: * -> *
and the auxiliary PFormat :: * -> * -> *
are non-dependent, purely type functions. Their arguments are types
(kind *
) rather than terms such as String
.
There are people who would call a language with type functions
`dependently typed'. After all, the type Desc a
reflects the
particular value of the format descriptor. For example, the type Desc
Int
corresponds to the value INT
, and no other value has that
type. Since the value of the format descriptor influences the type so
directly and unambiguously, it seems the type of printf
sort of
depends on the format descriptor value. This dependence however is not
sufficient to type check fts
. Our emulation gives printf
the
already parsed format string. We have shifted the burden of parsing
the format string from the type checker to the programmer -- gaining
simpler, non-dependent typing but losing the flexibility of building
arbitrary conversion specifications such as those in fts
.
The signature of printf
guides, even compels the implementation. Just
as PS
used an auxiliary type function PFormat
, printf
relies
on the auxiliary function interp
to interpret the format descriptor.
The structure of PFormat
drives the implementation of interp
:
printf desc = interp desc id interp :: Desc a -> (String -> w) -> PFormat a w interp (LIT str) = \k -> k str interp INT = \k -> k . show interp CHAR = \k -> k . \c -> [c] interp (x :^ y) = \k -> interp x (\sx -> interp y (\sy -> k (sx ++ sy)))
The interpreter is clearly compositional: the interpretation of a descriptor
depends only on the components of the descriptor regardless of the overall
context. The compositionality gives us an idea to name the applications
of interp
to primitive descriptors:
lit :: String -> (String -> w) -> w -- cf. PFormat () w = w lit str = \k -> k str -- which is, interp (LIT str) int :: (String -> w) -> (Int -> w) -- cf. PFormat Int w = Int -> w int = \k -> k . show -- which is, interp INT char :: (String -> w) -> (Char -> w) -- cf. PFormat Char w = Int -> w char = \k -> k . \c -> [c] -- which is, interp CHARand introduce the operator
(^)
for combining these interpreted descriptors.
The operator must satisfy the equalityinterp (desc1 :^ desc2) === (interp desc1) ^ (interp desc2)which immediately gives us the implementation
(^) :: ((String -> t1) -> t) -> ((String -> t2) -> t1) -> (String -> t2) -> t (^) x y = \k -> x (\sx -> y (\sy -> k (sx ++ sy)))
Instead of writing the format descriptor as a GADT such as fmtG
, we
write the interpreted descriptor, interp fmtG
, directly in terms of
the interpreted primitive descriptors lit
, int
and char
:
-- The interpreted descriptor: interp fmtG fmtI = lit "The value of " ^ char ^ lit " is " ^ int -- fmtI :: (String -> w) -> Char -> Int -> w
The main function printf
now receives the already interpreted descriptor
as the argument, so it becomes
printf' idesc = idesc idwhose sample application is
tIr = printf' fmtI 'x' 3 -- "The value of x is 3"
We have thus derived one of Danvy's original implementations of
type-safe printf
. By making the programmer write interpreted format
descriptors (which is hardly a burden in this case: fmtI
requires
fewer keystrokes than fmtG
) we gain even simpler typing. We no
longer need the GADT Desc a
; furthermore, the type function PFormat
is incorporated into the types of lit
, int
, etc. and
does not have to be defined explicitly. The type-safe printf'
and fmtI
are hence typeable in the Hindley-Milner system. Since we have
wired-in the interpreter interp
, we have lost the ability to change
it, that is, to interpret the format descriptor in a different way --
to parse input, to format to a network pipe, etc. Incidentally, we
regain the ability to change the interpreter if we abstract over it,
using type classes -- obtaining the tagless-final type-safe printf
.
The interpreter interp
is written in the continuation k
passing
style -- which is forced upon us by the structure of PFormat
. The last
clause of that type function
type instance PFormat (a,b) w = PFormat a (PFormat b w)says that format descriptors are interpreted in sequence: first the descriptor
a
receives and formats its arguments, then
the descriptor b
. In other words, the meaning of the annotation (a,b)
in the context w
is the meaning of a
in the context (PFormat b w)
.
CPS immediately springs to mind. Incidentally, the inferred type for (^)
looks exactly like the type for string concatenation in the
type-and-effect system of Danvy and Filinski for shift
.
In conclusion, we showed a simple example of printf
that is only
typeable in a dependently-typed system. The example, fts
, is the
touchstone test for dependent types, distinguishing genuine
dependently-typed systems from emulations. We have described a method
to derive these emulations -- progressively shifting the burden of
parsing and interpreting the format string from the type checker to the
programmer -- and hence moving from dependent functions to
non-dependent type functions to no type-functions at all. Luckily for printf
, the loss in expressiveness is minimal and the burden is
light and arguably worth carrying anyway.
oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML