The motivation for the present project was to answer a
challenge to write universally polymorphic list processing functions
in ANSI C. These functions (not macros!) must be truly polymorphic [CARD85], must not rely on compiler extensions, and
must not subvert the type system of C by performing downcasts and
especially upcasts from void *
. One interesting outcome
is a demonstration of a limited functional-style programming in C, in
particular, of using higher-order polymorphic combinator fold
. Another conclusion is regarding sscanf()
and similar functions as coercion operators.
A weaker form of the challenge was set by Matthias Blume in his article posted on comp.lang.functional on Sep 29, 2000:
Write a function that measures the length of any list regardless of
its element type (and not even knowing yet what types may occur)
without using void *
.
The corresponding code [poly-list-v1.c] answers the
challenge. The only casts used are those from calloc()
and from 0. They are safe; they are present just to make the compiler
happier. The code compiles as gcc -W -Wall poly-list.c
with gcc emitting not a single
warning. Usually, given the -Wall
flag, gcc is quite annoying. When you run the resulting executable,
you will see
Testing Polymorphic lists Length of NIL is 0 Length of iota(5) is 5 Length of iota_str(42) is 42
Here iota(n)
creates a list of integers 0 though n-1
. Function iota_str(n)
returns a list of strings
: numbers 0 .. n-1
in their ASCII
representation.
Another answer to the challenge was posted by Joachim Durchholz on the discussion thread [Disc-thread].
Later on, Matthias extended and strengthen the challenge:
For example, write a C function that will non-destructively reverse any list regardless of its element type. Note that I said "function", not "macro". I want to pass the beast as an argument to another function, for example.
A source code file [poly-list.c] gives the answer. It
shows that the approach taken in solving the previous challenge can
indeed be generalized. The polymorphic list length code did not have
to retrieve elements of a list -- it merely had to count them. The
polymorphic list reverse on the other hand must be able to fetch and
store a data item, no matter what its type type and size are. That is
why the second challenge is stronger. Nevertheless, the task is
possible as the code shows, without using widening (i.e., upcasts) and
without even mentioning void *
. Again, the code compiles
with gcc -W -Wall
without a single warning. Built-in test
regression test cases run and print the expected results.
A truly polymorphic list is defined as
typedef struct s_LHead { /* A fixed-size structure */ struct s_LHead * next; int type_tag; /* Not strictly necessary, but nice to have */ int size; /* The size of the data below */ char * data; /* The data themselves, in an _external_ form as a character string */ } LHead;The
data
are allocated as a string of size size
on the heap. The data themselves do not have to be a part of the structure.
The test cases built into the code:
int main(void) { LHead * int_l = range(1,11); LHead * fl_l = range_step(0.1,10.1,0.5); printf("Testing Polymorphic lists\n"); printf("\nThe length of an empty list: %d\n",length(NIL)); printf("\nA list of integers: "); LInt_print(int_l); printf("\nIts length is %d", length(int_l)); printf("\nIts reverse is: "); LInt_print(reverse(int_l)); printf("\nIts sum is %d\n",LInt_sum(int_l)); printf("\nA list of floats: "); LFl_print(fl_l); printf("\nIts length is %d", length(fl_l)); printf("\nIts reverse is: "); LFl_print(reverse(fl_l)); printf("\nIts sum is %g\n",LFl_sum(fl_l)); return 0; }print the expected results.
The printing and summation functions are monomorphic: there is a
pair of such functions -- LInt_print()
and LInt_sum()
for a list of integers, and the corresponding pair for
a list of floating-point numbers. On the other hand, length()
and reverse()
functions are polymorphic.
There is a single version of these functions, which can process lists
of any
data type. A function LHead * range(const
int i_from, const int i_to)
generates a list of integers i_from .. i_to-1
; a function LHead * range_step(const double from, const double to, const double step)
makes a list of floating-point numbers.
The code file [poly-list.c] conforms to ANSI C and
satisfies the challenge. No subversive casts are used -- in fact,
there are no casts at all except for trivial constants (char
*)0
and ((LHead *)0)
. The code deliberately avoids
functions calloc()
and free()
because they
return or take values of a type void*
, which require
down- or up-casting to the proper type. A not-quite-standard function
snprintf()
in the code can be safely replaced by the
ANSI-standard function sprintf()
. I however prefer the
former as sprintf()
must die.
If we broaden our solution further we come to Tcl. The most
generic representation of a (possibly polymorphic) ordered collection
of items is a single character string. The string lists each item in
its printed form and separates them by whitespace(s). Whitespaces that are a
part of an item representation are escaped using backslashes or braces. Items
that are sequences of arbitrary bytes are base64-encoded. Such
a representation is not very efficient, yet Tcl until version 8 used
exactly that. This universal textual representation of any data structure
clearly lends itself to generic, polymorphic processing. The corresponding
C code will operate only on char
and char *
,
and will therefore be statically type-safe. This is yet another
evidence that dynamic typing is a "specialization" of static typing,
as Robert Harper has asserted.
The polymorphic list code has elicited a few interesting comments:
atoi()
. However,
the challenge has never mentioned any performance requirements.assert( elem != NIL && elem->type_tag == LFl_type );
The polymorphic list code defines a FOLD combinator, in a type-safe manner. FOLD is a higher-order polymorphic C "function". Of course it must be a macro if it is to be higher-order and polymorphic:
#define XY(X,Y) X##Y #define MakeNameXY(FX,LINE) XY(FX,LINE) #define MakeName(FX) MakeNameXY(FX,__LINE__) #define FOLD(List, Functor, Seed_var) \ { LHead * MakeName(L_H_e_a_d) = List; \ for(;MakeName(L_H_e_a_d) != NIL; MakeName(L_H_e_a_d)=MakeName(L_H_e_a_d)->next) \ Seed_var = Functor(MakeName(L_H_e_a_d),Seed_var); }The FOLD combinator takes a List, a Functor, and a Seed variable. The Functor should have a signature
a Node * b -> b
. Therefore, the type of FOLD is a List * (a Node * b -> b) * b Ref -> ()
. It is
polymorphic indeed. If the List is empty, the Seed_var is not modified.
Otherwise, it is set toFunctor(List.e1,Functor(List.e2, ... Functor(list.en,Seed_var)))
For example, this is how the sum of all elements of a list of floating-point numbers is computed:
static double LFl_add(LHead * elem, const double old_total) { assert( elem != NIL && elem->type_tag == LFl_type ); return old_total + atof(elem->data); } /* Sum all elements of the list */ static double LFl_sum(LHead * list) { double total = 0; FOLD(list,LFl_add,total); return total; }
Even a dull compiler will inline the LFl_add application, which will make the loop as efficient as it can be. Summation of the elements of an integer list is similar. Again, no casts are used.
I never thought of sscanf()
as a coercion
operator. Come to think of it, serialization (pickling) is a
polymorphic type conversion operation, from a concrete type to a
universal character-string type.
It has never before occurred to me to use the FOLD combinator in C. It appears there is a point to this exercise after all: functional programming helps in writing even C code.
Throughout the discussion thread I was puzzled by how much we can argue about the matter that we all basically agree upon. A type is a set, or a membership function. Actually it is a composition of two functions: one is evaluated at compile time, the other is computed at run time. Languages differ in ways they decompose type membership functions into the two parts. What makes C special is that its type system -- besides being weak -- is remarkably easily subverted. Furthermore, this subversion is a very common practice, deeply ingrained in a C programmer (I speak for myself). The motivation to answer the challenges was to illustrate that it is possible to decompose the type function of polymorphic lists in such a way that the compile-time part will be in the perfect agreement with the C type system. Of course because the type system of C is so weak, "the burden of proof" gets shifted onto the run time. The exercise also showed just how hard it is to program in C without explicit or implicit casts.
The present page is a compilation of three articles posted on comp.lang.functional on Sep 29 and Oct 1 and 2, 2000.
[poly-list-v1.c] The first version of the code
<poly-list-v1.c> [1K]
[poly-list.c] The final version of the code
<poly-list.c> [7K]
[Disc-thread] Discussion thread Re: FP's and OO and generic programming
on comp.lang.functional
<http://groups.google.com/groups?hl=en&lr=&safe=off&ic=1&th=4b87b55f932059d4&seekm=8ravsq%24c2t%241%40nnrp1.deja.com#p>
[CARD85] Luca Cardelli, P. Wegner, On Understanding Types, Data Abstraction,
and Polymorphism. ACM Comp. Surveys, 17(4):471-522, Dec. 1985
This site's top page is http://okmij.org/ftp/
oleg-at-okmij.orgConverted from SXML by SXML->HTML