Truly polymorphic lists in C

Introduction
Measuring length of a polymorphic list
Polymorphic list processing
Fold combinator in C
Conclusions and lessons learned
References

Introduction

The motivation for the present project was to answer a challenge to write universally polymorphic list processing functions in ANSI C. These functions (not macros!) must be truly polymorphic [CARD85], must not rely on compiler extensions, and must not subvert the type system of C by performing downcasts and especially upcasts from void * . One interesting outcome is a demonstration of a limited functional-style programming in C, in particular, of using higher-order polymorphic combinator fold . Another conclusion is regarding sscanf() and similar functions as coercion operators.

Measuring length of a polymorphic list

A weaker form of the challenge was set by Matthias Blume in his article posted on comp.lang.functional on Sep 29, 2000:

Write a function that measures the length of any list regardless of its element type (and not even knowing yet what types may occur) without using void * .

The corresponding code [poly-list-v1.c] answers the challenge. The only casts used are those from calloc() and from 0. They are safe; they are present just to make the compiler happier. The code compiles as gcc -W -Wall poly-list.c with gcc emitting not a single warning. Usually, given the -Wall flag, gcc is quite annoying. When you run the resulting executable, you will see

     Testing Polymorphic lists
     Length of NIL is 0
     Length of iota(5) is 5
     Length of iota_str(42) is 42

Here iota(n) creates a list of integers 0 though n-1 . Function iota_str(n) returns a list of strings : numbers 0 .. n-1 in their ASCII representation.

Another answer to the challenge was posted by Joachim Durchholz on the discussion thread [Disc-thread].

Polymorphic list processing

Later on, Matthias extended and strengthen the challenge:

For example, write a C function that will non-destructively reverse any list regardless of its element type. Note that I said "function", not "macro". I want to pass the beast as an argument to another function, for example.

A source code file [poly-list.c] gives the answer. It shows that the approach taken in solving the previous challenge can indeed be generalized. The polymorphic list length code did not have to retrieve elements of a list -- it merely had to count them. The polymorphic list reverse on the other hand must be able to fetch and store a data item, no matter what its type type and size are. That is why the second challenge is stronger. Nevertheless, the task is possible as the code shows, without using widening (i.e., upcasts) and without even mentioning void * . Again, the code compiles with gcc -W -Wall without a single warning. Built-in test regression test cases run and print the expected results.

A truly polymorphic list is defined as

     typedef struct s_LHead {        /* A fixed-size structure */
       struct s_LHead * next;
       int type_tag;         /* Not strictly necessary, but nice to have */
       int size;             /* The size of the data below */
       char * data;          /* The data themselves, in an _external_ form
                                as a character string */
     } LHead;

The data are allocated as a string of size size on the heap. The data themselves do not have to be a part of the structure.

The test cases built into the code:

     int main(void)
     {
       LHead * int_l = range(1,11);
       LHead * fl_l = range_step(0.1,10.1,0.5);
      
       printf("Testing Polymorphic lists\n");
       printf("\nThe length of an empty list: %d\n",length(NIL));
      
       printf("\nA list of integers: "); LInt_print(int_l);
       printf("\nIts length is %d", length(int_l));
       printf("\nIts reverse is: "); LInt_print(reverse(int_l));
       printf("\nIts sum is %d\n",LInt_sum(int_l));
      
       printf("\nA list of floats: "); LFl_print(fl_l);
       printf("\nIts length is %d", length(fl_l));
       printf("\nIts reverse is: "); LFl_print(reverse(fl_l));
       printf("\nIts sum is %g\n",LFl_sum(fl_l));
       return 0;
     }

print the expected results.

The printing and summation functions are monomorphic: there is a pair of such functions -- LInt_print() and LInt_sum() for a list of integers, and the corresponding pair for a list of floating-point numbers. On the other hand, length() and reverse() functions are polymorphic. There is a single version of these functions, which can process lists of any data type. A function LHead * range(const int i_from, const int i_to) generates a list of integers i_from .. i_to-1 ; a function LHead * range_step(const double from, const double to, const double step) makes a list of floating-point numbers.

The code file [poly-list.c] conforms to ANSI C and satisfies the challenge. No subversive casts are used -- in fact, there are no casts at all except for trivial constants (char *)0 and ((LHead *)0) . The code deliberately avoids functions calloc() and free() because they return or take values of a type void* , which require down- or up-casting to the proper type. A not-quite-standard function snprintf() in the code can be safely replaced by the ANSI-standard function sprintf() . I however prefer the former as sprintf() must die.

If we broaden our solution further we come to Tcl. The most generic representation of a (possibly polymorphic) ordered collection of items is a single character string. The string lists each item in its printed form and separates them by whitespace(s). Whitespaces that are a part of an item representation are escaped using backslashes or braces. Items that are sequences of arbitrary bytes are base64-encoded. Such a representation is not very efficient, yet Tcl until version 8 used exactly that. This universal textual representation of any data structure clearly lends itself to generic, polymorphic processing. The corresponding C code will operate only on char and char * , and will therefore be statically type-safe. This is yet another evidence that dynamic typing is a "specialization" of static typing, as Robert Harper has asserted.

The polymorphic list code has elicited a few interesting comments:

"Your list stores string representations of integers/doubles/etc and not integers or doubles per se.": The meaning of 'doubles per se' is in the eye of the beholder. To a x386 processor that does not have a FP unit, a double is not native. The above objection applies to any programming language system that employs boxed integers and floats. Granted, unboxing an integer is usually less expensive than atoi() . However, the challenge has never mentioned any performance requirements.
Regarding assert( elem != NIL && elem->type_tag == LFl_type ); "Your list is not statically type-safe": We need to consider the meaning of types in the polymorphic list code. A syntactically correct Scheme code can be called statically type-safe, as the only type immediately visible to the compiler is the universal type (some Scheme compilers may of course be more discerning). The same follows for any language with latent types. Even Java with its extensive static type system is not statically type-safe, and has to introduce run-time type checks. Again, the definition of the challenge does not ascribe a particular type to the list and does not set the exact boundary between compile-time and run-time validity checks.

Fold combinator in C

The polymorphic list code defines a FOLD combinator, in a type-safe manner. FOLD is a higher-order polymorphic C "function". Of course it must be a macro if it is to be higher-order and polymorphic:

     #define XY(X,Y) X##Y
     #define MakeNameXY(FX,LINE) XY(FX,LINE)
     #define MakeName(FX) MakeNameXY(FX,__LINE__)
     
     #define FOLD(List, Functor, Seed_var) \
       { LHead * MakeName(L_H_e_a_d) = List; \
         for(;MakeName(L_H_e_a_d) != NIL; MakeName(L_H_e_a_d)=MakeName(L_H_e_a_d)->next) \
           Seed_var = Functor(MakeName(L_H_e_a_d),Seed_var); }

The FOLD combinator takes a List, a Functor, and a Seed variable. The Functor should have a signature a Node * b -> b . Therefore, the type of FOLD is a List * (a Node * b -> b) * b Ref -> () . It is polymorphic indeed. If the List is empty, the Seed_var is not modified. Otherwise, it is set to

     Functor(List.e1,Functor(List.e2, ... Functor(list.en,Seed_var)))

For example, this is how the sum of all elements of a list of floating-point numbers is computed:

     static double LFl_add(LHead * elem, const double old_total)
     {
       assert( elem != NIL && elem->type_tag == LFl_type );
       return old_total + atof(elem->data);
     }
     
     /* Sum all elements of the list */
     static double LFl_sum(LHead * list)
     {
       double total = 0;
       FOLD(list,LFl_add,total);
       return total;
     }

Even a dull compiler will inline the LFl_add application, which will make the loop as efficient as it can be. Summation of the elements of an integer list is similar. Again, no casts are used.

Conclusions and lessons learned

I never thought of sscanf() as a coercion operator. Come to think of it, serialization (pickling) is a polymorphic type conversion operation, from a concrete type to a universal character-string type.

It has never before occurred to me to use the FOLD combinator in C. It appears there is a point to this exercise after all: functional programming helps in writing even C code.

Throughout the discussion thread I was puzzled by how much we can argue about the matter that we all basically agree upon. A type is a set, or a membership function. Actually it is a composition of two functions: one is evaluated at compile time, the other is computed at run time. Languages differ in ways they decompose type membership functions into the two parts. What makes C special is that its type system -- besides being weak -- is remarkably easily subverted. Furthermore, this subversion is a very common practice, deeply ingrained in a C programmer (I speak for myself). The motivation to answer the challenges was to illustrate that it is possible to decompose the type function of polymorphic lists in such a way that the compile-time part will be in the perfect agreement with the C type system. Of course because the type system of C is so weak, "the burden of proof" gets shifted onto the run time. The exercise also showed just how hard it is to program in C without explicit or implicit casts.

References

The present page is a compilation of three articles posted on comp.lang.functional on Sep 29 and Oct 1 and 2, 2000.

[poly-list-v1.c] The first version of the code
<poly-list-v1.c> [1K]

[poly-list.c] The final version of the code
<poly-list.c> [7K]

[Disc-thread] Discussion thread Re: FP's and OO and generic programming on comp.lang.functional
<http://groups.google.com/groups?hl=en&lr=&safe=off&ic=1&th=4b87b55f932059d4&seekm=8ravsq%24c2t%241%40nnrp1.deja.com#p>

[CARD85] Luca Cardelli, P. Wegner, On Understanding Types, Data Abstraction, and Polymorphism. ACM Comp. Surveys, 17(4):471-522, Dec. 1985

Last updated May 4, 2001

This site's top page is http://okmij.org/ftp/

oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!

Converted from SXML by SXML->HTML