We present two heterogeneous metaprogramming systems whose host language is OCaml and object language is C. The first relies on offshoring: treating a subset of (MetaOCaml-generated) OCaml as a different notation for (a subset of) C. The second embeds C in OCaml in tagless-final style. The systems have been used in several projects, including the generation of C supersets OpenCL and OpenMP.
Generating C with some correctness guarantees is far less trivial than it may appear, with pitfalls abound. Not coincidentally, the most subtle ones accompany the introduction of variables into the code. Maintaining the offshoring system has traps of its own. We expound the pitfalls we have came across in our experience, and describe counter-measures.
Magnus Myreen. Quoted by permission.``This make me think that the authors are under the impression that compilers can magically transform bad code into efficient high-quality code by clever enough optimizations. As a compiler writer, I consider this a complete myth that many students and non-compiler-developers hope is a truth. I agree that compilers should aim to improve the efficiency of the code given as input, but one cannot expect the compiler to recognize algorithms and swap them for ones with better asymptotic complexity or make other major code transformations that really should be the job of the programmer. [emphasis mine]''
The reason for existence of code generation is helping programmers make major code transformations -- by conveniently and concisely expressing the transformation steps and by relieving the programmers from worrying if the result even compiles.
let addv = fun n vout v1 v2 -> for i=0 to n-1 do vout.(i) <- v1.(i) + v2.(i) doneIt is rather easy to imagine the C code it corresponds to. If we could implement this correspondence, then by generating OCaml we may, in effect, generate C. Offshoring is the library that realizes such a correspondence. Thus,
Offshoring is very different from an OCaml-to-C compiler. The latter has to handle the full OCaml. Offshoring needs to deal only with a small, imperative subset of OCaml, without closures, higher-order functions, algebraic data types, etc. Therefore, it is much simpler than a compiler to implement and maintain, and much easier to ensure correctness. Although the supported subset of OCaml is small, it has proved adequate for numerical, high-performance computing (including OpenCL and OpenMP programming) and embedded system programming.
Offshoring was first proposed by Eckhardt, Kaiabachev, Pasalic, Swadi and Taha (New Generation Computing, 2007). It is completely re-thought and re-implemented in BER MetaOCaml since version N107. The paper Generating C describes this system, also mentioning why the original offshoring became unavailable and what has been done to make the re-implementation to last. Here we elaborate the running example of offshoring, and point to further examples.
The running example is the vector addition code addv
above. The input to
offshoring is MetaOCaml generated code: so-called closed code
value. In our example, the simplest way to obtain such a value is by
enclosing the addv
code in MetaOCaml brackets:
let addv_staged = .<fun n vout v1 v2 -> for i=0 to n-1 do vout.(i) <- v1.(i) + v2.(i) done>.Code values can be printed out:
val addv_staged : (int -> int array -> int array -> int array -> unit) code = .< fun n_1 vout_2 v1_3 v2_4 -> for i_5 = 0 to n_1 - 1 do Stdlib.Array.set vout_2 i_5 ((Stdlib.Array.get v1_3 i_5) + (Stdlib.Array.get v2_4 i_5)) done>.We see the original
addv
, but with the desugared array indexing
operations and with renamed identifiers, to ensure uniqueness.
To easily support a variety of low-level languages, offshoring actually maps a subset of OCaml to a simple imperative `intermediate language' (IR). One may see the IR code for the running example by
let _ = Offshoring.(offshore (module DefaultConv) addv_staged)whose result is
- : proc_t = ([("n_1", TInt); ("vout_2", TArray1 TInt); ("v1_3", TArray1 TInt); ("v2_4", TArray1 TInt)], TUnit, Block (<abstr>, For {id = "i_5"; ty = TInt; lwb = Const (Const_int 0); upb = LocalVar "n_1"; step = Const (Const_int 1); body = Block (<abstr>, Exp (FunCall ("array1_set", [LocalVar "vout_2"; LocalVar "i_5"; FunCall ("+", [FunCall ("array1_get", [LocalVar "v1_3"; LocalVar "i_5"]); FunCall ("array1_get", [LocalVar "v2_4"; LocalVar "i_5"])])])))}))Such an IR expression may then be `pretty-printed', say, to C. The function
offshore_to_c
combines the IR translation with pretty-printing. For example,
let _ = Offshoring.offshore_to_c ~name:"addv" addv_stagedproduces (on the standard output; the optional
~out
argument
may specify a different output stream)
void addv(int n_1,int * vout_2,int * v1_3,int * v2_4){ for (int i_5 = 0; i_5 < n_1; i_5 += 1) (vout_2[i_5]) = (v1_3[i_5]) + (v2_4[i_5]); }which is how one might have imagined the C code corresponding to the original OCaml
addv
.
Suppose the size of the arrays to add is statically known, and is
small enough so that we wish to fully unroll the loop. To generate the
unrolled code, we first have to generalize addv
:
let iota n = List.init n Fun.id let addvg n vout v1 v2 = iota n |> List.iter (fun i -> vout.(i) <- v1.(i) + v2.(i))and test on sample arrays that
addvg
has the same behavior as addv
.
Trying to offshore addvg
naively
let addvg_staged_full = .<fun n vout v1 v2 -> iota n |> List.iter (fun i -> vout.(i) <- v1.(i) + v2.(i))>.gives an error however: the code value (the bracketed expression) contains a first-class function and is hence outside the offshorable subset.
If the size n
of the arrays is indeed known at the
generation time, addvg
may be staged differently:
let addvg_staged n = .<fun vout v1 v2 -> .~(iota n |> List.map (fun i -> .<vout.(i) <- v1.(i) + v2.(i)>.) |> seq)>. (* val addvg_staged : int -> (int array -> int array -> int array -> unit) code *)where
seq : unit code list -> unit code
builds a code sequence. As
the inferred type indicates, addvg_staged
takes an integer and
produces the code of a function of three array arguments. For example,
let addvg_staged4 = addvg_staged 4
generates:
val addvg_staged4 : (int array -> int array -> int array -> unit) code = .< fun vout_11 v1_12 v2_13 -> Stdlib.Array.set vout_11 0 ((Stdlib.Array.get v1_12 0) + (Stdlib.Array.get v2_13 0)); Stdlib.Array.set vout_11 1 ((Stdlib.Array.get v1_12 1) + (Stdlib.Array.get v2_13 1)); Stdlib.Array.set vout_11 2 ((Stdlib.Array.get v1_12 2) + (Stdlib.Array.get v2_13 2)); Stdlib.Array.set vout_11 3 ((Stdlib.Array.get v1_12 3) + (Stdlib.Array.get v2_13 3))>.We may again test that the generated code has the behavior of
addv 4
. Finally, we map it to C:
let _ = Offshoring.offshore_to_c ~name:"addv4" addvg_staged4obtaining
void addv4(int * vout_11,int * v1_12,int * v2_13){ (vout_11[0]) = (v1_12[0]) + (v2_13[0]); (vout_11[1]) = (v1_12[1]) + (v2_13[1]); (vout_11[2]) = (v1_12[2]) + (v2_13[2]); (vout_11[3]) = (v1_12[3]) + (v2_13[3]); }
Thus, addv
is easy to stage for no unrolling, but addvg
for full
unrolling. One may wish to write vector addition in such a way so that
no unrolling, full unrolling, partial unrolling, scalar promotion
etc. can be done without constantly re-writing it. The example
addv.ml
demonstrates how to
write the vector addition once, in a clear, textbook form -- and apply a
variety of optimizations in a modular way, reusing or extending earlier
optimizations. Eventually, after strip mining and scalar promotion we
obtain C code with the OpenBLAS look-and-feel:
void addv(int n_16,double * ar_17,double * ar_18,double * ar_19){ const int t_21 = n_16 & -4; const int t_22 = (0 + 3) & -4; for (int i_33 = 0; i_33 < t_22; i_33 += 1) (ar_17[i_33]) = (ar_18[i_33]) + (ar_19[i_33]); for (int i_24 = t_22; i_24 < t_21; i_24 += 4){ const double t_26 = (ar_18[i_24 + 0]) + (ar_19[i_24 + 0]); const double t_28 = (ar_18[i_24 + 1]) + (ar_19[i_24 + 1]); const double t_30 = (ar_18[i_24 + 2]) + (ar_19[i_24 + 2]); const double t_32 = (ar_18[i_24 + 3]) + (ar_19[i_24 + 3]); (ar_17[i_24 + 0]) = t_26; (ar_17[i_24 + 1]) = t_28; (ar_17[i_24 + 2]) = t_30; (ar_17[i_24 + 3]) = t_32; } for (int i_23 = t_21; i_23 < n_16; i_23 += 1) (ar_17[i_23]) = (ar_18[i_23]) + (ar_19[i_23]); }
An illustration of the extensibility of offshoring: generating C
code that includes calls to library functions (C file stdio, in our case)
using library-specific data types (FILE
)
offshore_ext.ml [3K]
The code for the example
file_stub.ml [2K]
The stub for a simple subset of C FILE i/o
offshore_ext_gen.c [<1K]
Testing the generated C code
addv.ml [15K]
The complete development for the advanced example.
It generates addv_offshored.c
when made
addv_gen.c [2K]
Testing the generated C code
Generating C
The paper that presents the new offshoring, describing
the attractiveness, and also pitfalls and challenges of
treating a subset of OCaml as a notation for C
The running example is vector addition: the same example used for Offshoring. To remind, it is written in OCaml as
let addv = fun n vout v1 v2 -> for i=0 to n-1 do vout.(i) <- v1.(i) + v2.(i) done
We now try to write it as an embedded DSL program; specifically, a
program in a
statement-oriented DSL. Many low-level languages draw a sharp
distinction between an expression and a statement. Consider C:
x < 10
is an expression, but while(x < 10) {x = foo();}
is a
statement. Only expressions are allowed as the terminating condition of
while loop. An expression always yields a value; a statement often
does not. Still, there are value-yielding
statements: return
.
By examining the vector addition code, we determine the DSL operations needed to write it:
module type cde = sig type 'a exp (* Abstract type of an expression *) val int : int -> int exp val ( + ) : int exp -> int exp -> int exp val ( - ) : int exp -> int exp -> int exp type 'a stm (* A statement *) val for_ : int exp -> int exp -> (int exp -> unit stm) -> unit stm type 'a arr (* An array (variable) *) val array_get : 'a arr -> int exp -> 'a exp val array_set : 'a arr -> int exp -> 'a exp -> unit stm type 'a proc_t (* The complete function *) val mk3arr : (int exp -> int arr -> int arr -> int arr -> unit stm) -> (int -> int array -> int array -> int array -> unit) proc_t endIt is a typed first-order statement-oriented DSL embedded in OCaml. In the tagless-final style, an embedded DSL is specified as an OCaml signature, which defines types for DSL categories, and combinators that correspond to DSL expression forms. (Speaking algebraically, the embedded language is represented as a (multi-sorted) algebra whose operations are the syntactic forms of the language.) The type
'a exp
represents DSL expressions of type 'a
; 'a stm
statements, and
'a proc_t
complete procedures. Expressions are built of integer
literals and variables (at present, arguments) using addition,
subtraction and array dereference operations. Statements are the
for-loop and array element assignment. The operation mk3arr
builds
the procedure header (of procedures taking three array arguments) and
provides arguments to use in the function body. It looks too
particular. We can generalize later. The strength of the tagless-final
approach is extensibility: therefore, we may start with a particular
and add more general when and if needed.
The operations of the cde
signature indeed let us express vector addition:
module ADDV(C:cde) = struct open C let addv_dsl = mk3arr @@ fun n vout v1 v2 -> for_ (int 0) (n - int 1) @@ fun i -> array_set vout i @@ array_get v1 i + array_get v2 i endIt looks quite like the OCaml
addv
code above, after desugaring of array
indexing operators. With custom indexing operators one can make addv_dsl
even more resembling addv
.
One obvious implementation of the signature is metacircular: 'a exp
,
'a stm
and 'a proc_t
are all just 'a
. That is, the DSL is (a
subset of) OCaml itself. Such an implementation is useful for
testing. We leave it as an exercise to the reader.
Another implementation is in terms of the MetaOCaml code:
module MOCde : (cde with type 'a proc_t = 'a code) = struct type 'a exp = 'a code let int = fun x -> .<x>. let ( + ) = fun x y -> .<.~x + .~y>. let ( - ) = fun x y -> .<.~x - .~y>. type 'a stm = 'a code let for_ lwb upb body = .<for i= .~lwb to .~upb do .~(body .<i>.) done>. type 'a arr = 'a array code let array_get a i = .< (.~a).(.~i) >. let array_set a i v = .< (.~a).(.~i) <- .~v >. type 'a proc_t = 'a code (* The complete function *) let mk3arr body = .<fun n vout v1 v2 -> .~(body .<n>. .<vout>. .<v1>. .<v2>.)>. endIt is in effect a compiler, from the DSL to OCaml. For example,
let module M = ADDV(MOCde) in M.addv_dsl
produces
- : (int -> int array -> int array -> int array -> unit) code = .< fun n_1 vout_2 v1_3 v2_4 -> for i_5 = 0 to n_1 - 1 do Stdlib.Array.set vout_2 i_5 ((Stdlib.Array.get v1_3 i_5) + (Stdlib.Array.get v2_4 i_5)) done>.which is identical to the result
addv_staged
in the Offshoring example earlier.
But we are interested in C. Let's write the implementation of cde
that generates C (repetitive definitions are elided; see
the complete code for detail):
module CCde = struct type 'a exp = string let int = string_of_int let ( + ) = Printf.sprintf "(%s + %s)" type 'a stm = string let for_ lwb upb body = let i = gensym "i" in "for(int " ^ i ^ "=" ^ lwb ^ "; " ^ i ^ "<=" ^ upb ^ "; " ^ i ^ "++){\n" ^ " " ^ body i ^ "}" type 'a arr = string let array_get = Printf.sprintf "%s[%s]" ... endNow,
let _ = let module M = ADDV(CCde) in print_endline M.addv_dsloutputs
void addv(int n_1, int* vout_2, int* v1_3, int* v2_4){ for(int i_5=0; i_5<=(n_1 - 1); i_5++){ vout_2[i_5]=(v1_3[i_5] + v2_4[i_5]);} }This is exactly the
addv
C code produced by Offshoring.
We have generated both OCaml and C code
from the same DSL expression (addv_dsl
).
The C and OCaml code for addv
indeed look very similar,
which was the inspiration for offshoring. They both look
similar to addv_dsl
. The similarities should not be
surprising: the addv expressions in different languages are different
interpretations of the same DSL expression addv_dsl
. Speaking
algebraically, they are different homomorphic images of the same
term. One may write many more similar implementations of the cde
signature, to obtain code in Fortran, WASM, LLVM IR, etc.
There is much room for improvement. For example, printf
and string
concatenations are an eye sore. Not for the DSL user however: They are
all hidden behind the abstraction layer. The end user only uses the
combinators of the cde
signature, as in addv_dsl
. The combinators
are typed and enforce the typing discipline on the generated
language. The type of the for_
combinator, for example, ensures that
the lower and upper bounds of the loop are integer expressions, and
the loop body is a statement. The existence of the metacircular or MOCde
implementations shows that the typing is compatible with OCaml typing,
and is, hence, sound. To ascertain that only well-formed code is
generated, we only have to examine the implementation CCde
(and
check, e.g., that statement generators produce code ending in a
closing brace or a semicolon). After such check we may claim that
any well-typed application of the combinators produces
well-formed and well-typed code -- without any need to examine that
code. (Our actual implementation uses C AST for C code
generation, which ensures well-formedness. C AST does not ensure
well-typedness however; but the tagless-final encapsulation does.)
Finally, because all the details of generation are hidden in DSL
combinators, the user-written code like addv_dsl
remains legible.
'a mut
. From the DSL point of
view, this is the syntactic distinction: Mutable variables are
not expressions. Therefore, mutable variables in our DSL (as in C,
Pascal, etc.) are not first class: one cannot pass them as an argument
or return as a result. One may only dereference of assign to them.
module type cdemut = sig include cde val float : float -> float exp (* also add floats for the sake of example *) val ( +. ) : float exp -> float exp -> float exp type 'a mut (* mutable variables *) val (let*) : 'a exp -> ('a mut -> 'w stm) -> 'w stm val dref : 'a mut -> 'a exp val (:=) : 'a mut -> 'a exp -> unit stm val (@.) : unit stm -> 'a stm -> 'a stm (* sequencing *) val ret : 'a exp -> 'a stm (* return *) val mk1arr : (int exp -> float arr -> 'a stm) -> (int -> float array -> 'a) proc_t end
The types make it clear that mutable variables may only be introduced in the statement context. Aliasing is impossible: an expression like
let* x = 0 in let* y = x in ...gives a type error pointing at the second x:
Error: This expression has type int mut but an expression was expected of type 'a expWe may only write
let* x = 0 in let* y = dref x in ...where it is clear that
y
is not an alias of x
; it is a new mutable
DSL variable initialized to the current value of x
.
The extended cdemut
signature lets us write
the next example: vector summation.
module VSUM(C:cdemut) = struct open C let vsum = mk1arr @@ fun n v -> let* sum = float 0. in begin for_ (int 0) (n - int 1) @@ fun i -> sum := dref sum +. array_get v i end @. ret (dref sum) end
Since defining new variables in C requires writing their type, our compiler has to perform type inference, which is straightforward. (Again, some definitions are elided for clarity.)
module CMutCde : (cdemut with type 'a proc_t = string) = struct type 'a typ = | TInt : int typ | TFloat : float typ | TUnit : unit typ let string_of_typ : type a. a typ -> string = function | TInt -> "int" | TFloat -> "double" | TUnit -> "void" type 'a exp = 'a typ * 'a CCde.exp let int x = (TInt, string_of_int x) (* typing rule *) let trule2 : type a b c. a typ -> b typ -> c typ -> (a CCde.exp -> b CCde.exp -> c CCde.exp) -> (a exp -> b exp -> c exp) = fun ty1 ty2 tyr f -> fun (_,x) (_,y) -> (tyr, f x y) let ( + ) = trule2 TInt TInt TInt CCde.( + ) let ( +.) = trule2 TFloat TFloat TFloat (Printf.sprintf "(%s + %s)") type 'a stm = 'a typ * 'a CCde.stm let for_ (TInt,lwb) (TInt,upb) body = (TUnit, CCde.for_ lwb upb (fun i -> let (TUnit,b) = body (TInt,i) in b)) type 'a arr = 'a typ * 'a CCde.arr let array_get (ty, a) (TInt, i) = (ty, CCde.array_get a i) let ret (t,x) = (t, Printf.sprintf "return %s;" x) let (@.) (TUnit,s1) (t,s2) = (t,Printf.sprintf "%s\n%s" s1 s2) type 'a mut = 'a typ * string let (let*) : 'a exp -> ('a mut -> 'w stm) -> 'w stm = fun (t,x) body -> let v = gensym "x" in let (tb,b) = body (t,v) in (tb,Printf.sprintf "%s %s = %s;\n%s" (string_of_typ t) v x b) let dref (t,v) = (t,v) let (:=) (_,v) (_,x) = (TUnit, Printf.sprintf "%s = %s;" v x) end
Since CMutCde
is an extended interpreter, it can interpret the
earlier code like ADDV
, with the same result. For vector summation,
let _ = let module M = VSUM(CMutCde) in print_endline M.vsumwe obtain
double sumv(int n_11, double* v_12){ double x_13 = 0.; for(int i_14=0; i_14<=(n_11 - 1); i_14++){ x_13 = (x_13 + v_12[i_14]);} return x_13; }
We can also implement the signature cdemut
in terms of the
Offshoring IR (see the paper), and hence take the full advantage of
its pretty-printers, to C or other low-level languages.