We present two heterogeneous metaprogramming systems whose host language is OCaml and object language is C. The first relies on offshoring: treating a subset of (MetaOCaml-generated) OCaml as a different notation for (a subset of) C. The second embeds C in OCaml in tagless-final style. The systems have been used in several projects, including the generation of C supersets OpenCL and OpenMP.
Generating C with some correctness guarantees is far less trivial than it may appear, with pitfalls abound. Not coincidentally, the most subtle ones accompany the introduction of variables into the code. Maintaining the offshoring system has traps of its own. We expound the pitfalls we have came across in our experience, and describe counter-measures.
let addv = fun n vout v1 v2 -> for i=0 to n-1 do vout.(i) <- v1.(i) + v2.(i) doneIt is rather easy to imagine the C code it corresponds to. If we could implement this correspondence, then by generating OCaml we may, in effect, generate C. Offshoring is the library that realizes such a correspondence. Thus,
Offshoring is very different from an OCaml-to-C compiler. The latter has to handle the full OCaml. Offshoring needs to deal only with a small, imperative subset of OCaml, without closures, higher-order functions, algebraic data types, etc. Therefore, it is much simpler than a compiler to implement and maintain, and much easier to ensure correctness. Although the supported subset of OCaml is small, it has proved adequate for numerical, high-performance computing (including OpenCL and OpenMP programming) and embedded system programming.
Offshoring was first proposed by Eckhardt, Kaiabachev, Pasalic, Swadi and Taha (New Generation Computing, 2007). It is completely re-thought and re-implemented in BER MetaOCaml since version N107. The paper Generating C describes this system, also mentioning why the original offshoring became unavailable and what has been done to make the re-implementation to last. Here we elaborate the running example of offshoring, and point to further examples.
The running example is the vector addition code addv
above. The input to
offshoring is MetaOCaml generated code: so-called closed code
value. In our example, the simplest way to obtain such a value is by
enclosing the addv
code in MetaOCaml brackets:
let addv_staged = .<fun n vout v1 v2 -> for i=0 to n-1 do vout.(i) <- v1.(i) + v2.(i) done>.Code values can be printed out:
val addv_staged : (int -> int array -> int array -> int array -> unit) code = .< fun n_1 vout_2 v1_3 v2_4 -> for i_5 = 0 to n_1 - 1 do Stdlib.Array.set vout_2 i_5 ((Stdlib.Array.get v1_3 i_5) + (Stdlib.Array.get v2_4 i_5)) done>.We see the original
addv
, but with the desugared array indexing
operations and with renamed identifiers, to ensure uniqueness.
To easily support a variety of low-level languages, offshoring actually maps a subset of OCaml to a simple imperative `intermediate language' (IR). One may see the IR code for the running example by
let _ = Offshoring.(offshore (module DefaultConv) addv_staged)whose result is
- : proc_t = ([("n_1", TInt); ("vout_2", TArray1 TInt); ("v1_3", TArray1 TInt); ("v2_4", TArray1 TInt)], TUnit, Block (<abstr>, For {id = "i_5"; ty = TInt; lwb = Const (Const_int 0); upb = LocalVar "n_1"; step = Const (Const_int 1); body = Block (<abstr>, Exp (FunCall ("array1_set", [LocalVar "vout_2"; LocalVar "i_5"; FunCall ("+", [FunCall ("array1_get", [LocalVar "v1_3"; LocalVar "i_5"]); FunCall ("array1_get", [LocalVar "v2_4"; LocalVar "i_5"])])])))}))Such an IR expression may then be `pretty-printed', say, to C. The function
offshore_to_c
combines the IR translation with pretty-printing. For example,
let _ = Offshoring.offshore_to_c ~name:"addv" addv_stagedproduces (on the standard output; the optional
~out
argument
may specify a different output stream)
void addv(int n_1,int * vout_2,int * v1_3,int * v2_4){ for (int i_5 = 0; i_5 < n_1; i_5 += 1) (vout_2[i_5]) = (v1_3[i_5]) + (v2_4[i_5]); }which is how one might have imagined the C code corresponding to the original OCaml
addv
.
Suppose the size of the arrays to add is statically known, and is
small enough so that we wish to fully unroll the loop. To generate the
unrolled code, we first have to generalize addv
:
let iota n = List.init n Fun.id let addvg n vout v1 v2 = iota n |> List.iter (fun i -> vout.(i) <- v1.(i) + v2.(i))and test on sample arrays that
addvg
has the same behavior as addv
.
Trying to offshore addvg
naively
let addvg_staged_full = .<fun n vout v1 v2 -> iota n |> List.iter (fun i -> vout.(i) <- v1.(i) + v2.(i))>.gives an error however: the code value (the bracketed expression) contains a first-class function and is hence outside the offshorable subset.
If the size n
of the arrays is indeed known at the
generation time, addvg
may be staged differently:
let addvg_staged n = .<fun vout v1 v2 -> .~(iota n |> List.map (fun i -> .<vout.(i) <- v1.(i) + v2.(i)>.) |> seq)>. (* val addvg_staged : int -> (int array -> int array -> int array -> unit) code *)where
seq : unit code list -> unit code
builds a code sequence. As
the inferred type indicates, addvg_staged
takes an integer and
produces the code of a function of three array arguments. For example,
let addvg_staged4 = addvg_staged 4
generates:
val addvg_staged4 : (int array -> int array -> int array -> unit) code = .< fun vout_11 v1_12 v2_13 -> Stdlib.Array.set vout_11 0 ((Stdlib.Array.get v1_12 0) + (Stdlib.Array.get v2_13 0)); Stdlib.Array.set vout_11 1 ((Stdlib.Array.get v1_12 1) + (Stdlib.Array.get v2_13 1)); Stdlib.Array.set vout_11 2 ((Stdlib.Array.get v1_12 2) + (Stdlib.Array.get v2_13 2)); Stdlib.Array.set vout_11 3 ((Stdlib.Array.get v1_12 3) + (Stdlib.Array.get v2_13 3))>.We may again test that the generated code has the behavior of
addv 4
. Finally, we map it to C:
let _ = Offshoring.offshore_to_c ~name:"addv4" addvg_staged4obtaining
void addv4(int * vout_11,int * v1_12,int * v2_13){ (vout_11[0]) = (v1_12[0]) + (v2_13[0]); (vout_11[1]) = (v1_12[1]) + (v2_13[1]); (vout_11[2]) = (v1_12[2]) + (v2_13[2]); (vout_11[3]) = (v1_12[3]) + (v2_13[3]); }
Thus, addv
is easy to stage for no unrolling, but addvg
for full
unrolling. One may wish to write vector addition in such a way so that
no unrolling, full unrolling, partial unrolling, scalar promotion
etc. can be done without constantly re-writing it. The example
addv.ml
demonstrates how to
write the vector addition once, in a clear, textbook form -- and apply a
variety of optimizations in a modular way, reusing or extending earlier
optimizations. Eventually, after strip mining and scalar promotion we
obtain C code with the OpenBLAS look-and-feel:
void addv(int n_16,double * ar_17,double * ar_18,double * ar_19){ const int t_21 = n_16 & -4; const int t_22 = (0 + 3) & -4; for (int i_33 = 0; i_33 < t_22; i_33 += 1) (ar_17[i_33]) = (ar_18[i_33]) + (ar_19[i_33]); for (int i_24 = t_22; i_24 < t_21; i_24 += 4){ const double t_26 = (ar_18[i_24 + 0]) + (ar_19[i_24 + 0]); const double t_28 = (ar_18[i_24 + 1]) + (ar_19[i_24 + 1]); const double t_30 = (ar_18[i_24 + 2]) + (ar_19[i_24 + 2]); const double t_32 = (ar_18[i_24 + 3]) + (ar_19[i_24 + 3]); (ar_17[i_24 + 0]) = t_26; (ar_17[i_24 + 1]) = t_28; (ar_17[i_24 + 2]) = t_30; (ar_17[i_24 + 3]) = t_32; } for (int i_23 = t_21; i_23 < n_16; i_23 += 1) (ar_17[i_23]) = (ar_18[i_23]) + (ar_19[i_23]); }
An illustration of the extensibility of offshoring: generating C
code that includes calls to library functions (C file stdio, in our case)
using library-specific data types (FILE
)
offshore_ext.ml [3K]
The code for the example
file_stub.ml [2K]
The stub for a simple subset of C FILE i/o
offshore_ext_gen.c [<1K]
Testing the generated C code
addv.ml [15K]
The complete development for the advanced example.
It generates addv_offshored.c
when made
addv_gen.c [2K]
Testing the generated C code
Generating C
The paper that presents the new offshoring, describing
the attractiveness, and also pitfalls and challenges of
treating a subset of OCaml as a notation for C
The running example is vector addition: the same example used for Offshoring. To remind, it is written in OCaml as
let addv = fun n vout v1 v2 -> for i=0 to n-1 do vout.(i) <- v1.(i) + v2.(i) done
We now try to write it as an embedded DSL program; specifically, a
program in a
statement-oriented DSL. Many low-level languages draw a sharp
distinction between an expression and a statement. Consider C:
x < 10
is an expression, but while(x < 10) {x = foo();}
is a
statement. Only expressions are allowed as the terminating condition of
while loop. An expression always yields a value; a statement often
does not. Still, there are value-yielding
statements: return
.
By examining the vector addition code, we determine the DSL operations needed to write it:
module type cde = sig type 'a exp (* Abstract type of an expression *) val int : int -> int exp val ( + ) : int exp -> int exp -> int exp val ( - ) : int exp -> int exp -> int exp type 'a stm (* A statement *) val for_ : int exp -> int exp -> (int exp -> unit stm) -> unit stm type 'a arr (* An array (variable) *) val array_get : 'a arr -> int exp -> 'a exp val array_set : 'a arr -> int exp -> 'a exp -> unit stm type 'a proc_t (* The complete function *) val mk3arr : (int exp -> int arr -> int arr -> int arr -> unit stm) -> (int -> int array -> int array -> int array -> unit) proc_t endIn the tagless-final style, an embedded DSL is specified as an OCaml signature. It defines types for DSL categories, and combinators that correspond to DSL expression forms. The type
'a exp
represents DSL expressions of type 'a
; 'a stm
statements, and
'a proc_t
complete
procedures. The constructor mk3arr
, the constructor of procedures
taking three array arguments, looks too particular. We can generalize
later. The strength of the tagless-final approach is extensibility:
therefore, we may start with a particular and
add more general when and if needed.
The cde
signature indeed lets us express vector addition:
module ADDV(C:cde) = struct open C let addv_dsl = mk3arr @@ fun n vout v1 v2 -> for_ (int 0) (n - int 1) @@ fun i -> array_set vout i @@ array_get v1 i + array_get v2 i endIt looks quite like the OCaml
addv
code, after desugaring (of array
indexing operators). With custom indexing operators one can make the
DSL even more resembling addv
.
One obvious implementation of the signature is metacircular: 'a exp
,
'a stm
and 'a proc_t
are all just 'a
. That is, the DSL is
OCaml itself. Such an implementation is useful for testing. We leave
it as an exercise to the reader.
Another implementation is in terms of the MetaOCaml code:
module MOCde : (cde with type 'a proc_t = 'a code) = struct type 'a exp = 'a code let int = fun x -> .<x>. let ( + ) = fun x y -> .<.~x + .~y>. let ( - ) = fun x y -> .<.~x - .~y>. type 'a stm = 'a code let for_ lwb upb body = .<for i= .~lwb to .~upb do .~(body .<i>.) done>. type 'a arr = 'a array code let array_get a i = .< (.~a).(.~i) >. let array_set a i v = .< (.~a).(.~i) <- .~v >. type 'a proc_t = 'a code (* The complete function *) let mk3arr body = .<fun n vout v1 v2 -> .~(body .<n>. .<vout>. .<v1>. .<v2>.)>. endIt is in effect a compiler, from the DSL to OCaml. For example,
let module M = ADDV(MOCde) in M.addv_dsl
produces
- : (int -> int array -> int array -> int array -> unit) code = .< fun n_1 vout_2 v1_3 v2_4 -> for i_5 = 0 to n_1 - 1 do Stdlib.Array.set vout_2 i_5 ((Stdlib.Array.get v1_3 i_5) + (Stdlib.Array.get v2_4 i_5)) done>.which is identical to the result
addv_staged
in the Offshoring example earlier.
But we are interested in C. Let's write the implementation of cde
that generates C (repetitive definitions are elided; see
the complete code for detail):
module CCde = struct type 'a exp = string let int = string_of_int let ( + ) = Printf.sprintf "(%s + %s)" type 'a stm = string let for_ lwb upb body = let i = gensym "i" in "for(int " ^ i ^ "=" ^ lwb ^ "; " ^ i ^ "<=" ^ upb ^ "; " ^ i ^ "++){\n" ^ " " ^ body i ^ "}" type 'a arr = string let array_get = Printf.sprintf "%s[%s]" ... endNow,
let _ = let module M = ADDV(CCde) in print_endline M.addv_dsloutputs
void addv(int n_1, int* vout_2, int* v1_3, int* v2_4){ for(int i_5=0; i_5<=(n_1 - 1); i_5++){ vout_2[i_5]=(v1_3[i_5] + v2_4[i_5]);} }This is exactly the
addv
C code produced by Offshoring.
We have generated both OCaml and C code
from the same DSL expression (module ADDV
).
There is a lot of room for improvement. For example, printf
and
string concatenations are an eye sore. Not for the DSL user however:
They are all hidden behind the abstraction layer. The abstraction
layer also guarantees the type safety, and the absence of unbound
variable names.
To help a DSL implementator, however, it makes sense to generate a C AST, to pretty-print at the end -- which is what we do in the real project.
Let's consider the first extension: mutable variables. They are not
as simple as they appear (see the separate web page on this
topic). The culprit, as usual, is aliasing. Mutable variables
presented a big problem for offshoring, which was solved just
recently. The signature below incorporates the learned
lessons: the main is distinguishing mutable variables,
represented by the dedicated type 'a mut
.
From the DSL point of view, this is
the syntactic distinction: Mutable variables are not
expressions. Therefore, mutable
variables in our DSL (as in C, Pascal, etc.) are not first class: one
cannot pass them as an argument or return as a
result. One may only dereference of assign to them.
module type cdemut = sig include cde val float : float -> float exp (* also add floats for the sake of example *) val ( +. ) : float exp -> float exp -> float exp type 'a mut (* mutable variables *) val (let*) : 'a exp -> ('a mut -> 'w stm) -> 'w stm val dref : 'a mut -> 'a exp val (:=) : 'a mut -> 'a exp -> unit stm val (@.) : unit stm -> 'a stm -> 'a stm (* sequencing *) val ret : 'a exp -> 'a stm (* return *) val mk1arr : (int exp -> float arr -> 'a stm) -> (int -> float array -> 'a) proc_t end
The types make it clear that mutable variables may only be introduced in the statement context. Aliasing is impossible: an expression like
let* x = 0 in let* y = x in ...gives a type error pointing at the second x:
Error: This expression has type int mut but an expression was expected of type 'a expWe may only write
let* x = 0 in let* y = dref x in ...where it is clear that
y
is not an alias of x
; it is a new mutable
DSL variable initialized to the current value of x
.
The extended cdemut
signature lets us write
the next example: vector summation.
module VSUM(C:cdemut) = struct open C let vsum = mk1arr @@ fun n v -> let* sum = float 0. in begin for_ (int 0) (n - int 1) @@ fun i -> sum := dref sum +. array_get v i end @. ret (dref sum) end
Since defining new variables in C requires writing their type, our compiler has to perform type inference, which is straightforward. (Again, some definitions are elided for clarity.)
module CMutCde : (cdemut with type 'a proc_t = string) = struct type 'a typ = | TInt : int typ | TFloat : float typ | TUnit : unit typ let string_of_typ : type a. a typ -> string = function | TInt -> "int" | TFloat -> "double" | TUnit -> "void" type 'a exp = 'a typ * 'a CCde.exp let int x = (TInt, string_of_int x) (* typing rule *) let trule2 : type a b c. a typ -> b typ -> c typ -> (a CCde.exp -> b CCde.exp -> c CCde.exp) -> (a exp -> b exp -> c exp) = fun ty1 ty2 tyr f -> fun (_,x) (_,y) -> (tyr, f x y) let ( + ) = trule2 TInt TInt TInt CCde.( + ) let ( +.) = trule2 TFloat TFloat TFloat (Printf.sprintf "(%s + %s)") type 'a stm = 'a typ * 'a CCde.stm let for_ (TInt,lwb) (TInt,upb) body = (TUnit, CCde.for_ lwb upb (fun i -> let (TUnit,b) = body (TInt,i) in b)) type 'a arr = 'a typ * 'a CCde.arr let array_get (ty, a) (TInt, i) = (ty, CCde.array_get a i) let ret (t,x) = (t, Printf.sprintf "return %s;" x) let (@.) (TUnit,s1) (t,s2) = (t,Printf.sprintf "%s\n%s" s1 s2) type 'a mut = 'a typ * string let (let*) : 'a exp -> ('a mut -> 'w stm) -> 'w stm = fun (t,x) body -> let v = gensym "x" in let (tb,b) = body (t,v) in (tb,Printf.sprintf "%s %s = %s;\n%s" (string_of_typ t) v x b) let dref (t,v) = (t,v) let (:=) (_,v) (_,x) = (TUnit, Printf.sprintf "%s = %s;" v x) end
Since CMutCde
is an extended interpreter, it can interpret the
earlier code like ADDV
, with the same result. For vector summation,
let _ = let module M = VSUM(CMutCde) in print_endline M.vsumwe obtain
double sumv(int n_11, double* v_12){ double x_13 = 0.; for(int i_14=0; i_14<=(n_11 - 1); i_14++){ x_13 = (x_13 + v_12[i_14]);} return x_13; }
We can also implement the signature cdemut
in terms of the
Offshoring IR (see the paper), and hence take the full advantage of
its pretty-printers, to C or other low-level languages.