MetaOCaml -- an OCaml dialect for multi-stage programming

 
BER MetaOCaml is a conservative extension of OCaml for ``writing programs that generate programs''. BER MetaOCaml adds to OCaml the type of code values (denoting ``program code'', or future-stage computations), and two basic constructs to build them: quoting and splicing. The generated code can be printed, stored in a file -- or compiled and linked-back to the running program, thus implementing run-time code optimization. An imperative subset of the generated code can also be converted to C, via offshoring. A well-typed BER MetaOCaml program generates only well-scoped and well-typed programs: The generated code shall compile without type errors. BER MetaOCaml is the complete re-implementation of the original MetaOCaml by Walid Taha, Cristiano Calcagno and collaborators.

 

Summary

MetaOCaml is a conservative extension of OCaml with staging annotations to construct typed code values. See below for a brief introduction to these features. The generated code can be printed, stored in a file -- or compiled and linked-back to the running program, thus implementing run-time code optimization. MetaOCaml code without staging annotations, or with the annotations erased, is regular OCaml.

MetaOCaml has been successfully used for the most optimal stream fusion, generating fast parsers and regular-expression matchers, specializing numeric and dynamic programming algorithms, building FFT kernels, compilers for an image processing and database query DSLs, OCaml server pages, generating families of specialized basic linear algebra and Gaussian Elimination routines, and high-performance stencil computations.

MetaOCaml is distinguished from Camlp4 and other such macro-processors by: hygiene (maintaining lexical scope); generating assuredly well-typed code; and the integration with higher-order functions, modules and other abstraction facilities of ML, hence promoting modularity and reuse of code generators. A well-typed BER MetaOCaml program generates only well-typed programs: The generated code shall compile without type errors. There are no longer problems of puzzling out a compilation error in the generated code (which is typically large, obfuscated and with unhelpful variable names). The errors are reported in terms of the generator rather than the generated code.

MetaOCaml is purely generative: the generated code is treated as a black box and cannot be examined. One can put code together but cannot take it apart. Pure generativity significantly simplifies the type system and strengthens the static assurances. It may seem that pure generativity precludes code optimizations. Fortunately, that is not the case. The literature has many examples of generating optimal code with MetaOCaml.

BER MetaOCaml is in the spirit of the original MetaOCaml by Taha, Calcagno and contributors but has been completely re-written using different algorithms and techniques. The old MetaOCaml code still runs with little or no changes. Overall, BER MetaOCaml aims at the most harmonious way of integrating staging with OCaml, minimizing the differences. It becomes easier to keep MetaOCaml up-to-date and ensure its long-term viability.

BER MetaOCaml has been re-structured to minimize the amount of changes to the OCaml type-checker and to separate the `kernel' from the `user-level'. The kernel is a set of patches and additions to OCaml, responsible for producing and type-checking code values. The processing of built code values -- so-called `running' -- is user-level. Currently the user-level metalib supports printing, type-checking, byte- and native- compiling and linking of code values. In addition, the generated code can be `offshored' -- converted to the intermediate, imperative language which may then be transcribed to C, OpenCL, LLVM, etc.

BER MetaOCaml is almost entirely backwards compatible with the original MetaOCaml. To the user, the major differences of BER MetaOCaml are as follows (see below on this page for the detailed explanation):

Smaller visible differences are special `subtypes' for the code that represents a function literal or a value; more helpful error messages; better printing of cross-stage persistent values and the full support for labeled arguments. A long-standing problem with records with polymorphic fields has fixed itself. The BER MetaOCaml code is now extensively commented, and has a regression test suite. MetaOCaml has never supported generating code with local or first-class modules and objects. However one may use modules and objects in the code generator.

By relying on attributes, the feature of OCaml 4.02, BER N102 has become much closer integrated with OCaml (and BER N114 is more still). It is instructive to compare the amount of changes BER MetaOCaml makes to the OCaml distribution. An earlier version (BER N101) modified 32 OCaml files. BER N114 modifies only 4 (that number could be further reduced to one; the only file with nontrivial modifications is typing/typecore.ml). It is now a distinct possibility that -- with small hooks that may be provided in the future OCaml versions -- MetaOCaml becomes just a regular library or a plug-in, rather being a fork.

The file NOTES.txt in the BER MetaOCaml distribution describes the features of BER MetaOCaml in more detail and outlines directions for further development. The R&D program is extensive, ranging from the staging theory to the developing of translators of the generated code to C, LLVM, and other targets. There is fun and tasks for everyone. Hopefully the release of BER MetaOCaml would incite using and researching typed meta-programming.

Version
The current version is N114, May, 2023
References
ber-metaocaml-ChangeLog [11K]

The Design and Implementation of BER MetaOCaml

Stream Fusion, to Completeness
The latest, most notable application

Reconciling Abstraction with High Performance: A MetaOCaml approach
The MetaOCaml book

ml2013-talk.pdf [200K]
MetaOCaml lives on: Lessons from implementing a staged dialect of a functional language
The annotated slides of the talk at the ACM SIGPLAN Workshop on ML September 22, 2013. Boston, MA, USA

 

Installation and availability

BER MetaOCaml N114 corresponds to OCaml 4.14.1. To install BER MetaOCaml through OPAM, please do
    $ opam update
    $ opam switch 4.14.1+BER
    $ eval `opam config env`
If you use Tuareg, the following options are reported to be useful:
    Tuareg group
    (setq tuareg-support-metaocaml t)
 
    Tuareg Faces group
    Tuareg Font Lock Multistage Face

One may also install BER MetaOCaml manually. Befitting the separation between the kernel and the user-level, BER MetaOCaml is distributed as a set of patches to OCaml plus a separate library metalib. The installation involves patching the OCaml distribution, building the OCaml system system as usual, and compiling metalib and the BER MetaOCaml toplevel. The patched OCaml compiler is fully source- and binary-compatible with OCaml and can process OCaml code, with staging constructs or without. MetaOCaml can also use OCaml-compiled modules and libraries as they are. BER MetaOCaml is available as:

 

Introduction to staging and MetaOCaml

The standard example of meta-programming -- the running example of A.P.Ershov's 1977 paper that begat partial evaluation -- is the power function, computing x^n. (See also mult.ml for a more realistic and illustrative example.) In OCaml:
    let square x = x * x
    let rec power n x =
      if n = 0 then 1
      else if n mod 2 = 0 then square (power (n/2) x)
      else x * (power (n-1) x)
    (* val power : int -> int -> int = <fun> *)
(The comment underneath the code shows the reply of the MetaOCaml toplevel.)

Suppose our program has to compute x^7 many times. We may define

    let power7 x = power 7 x
    (* val power7 : int -> int = <fun> *)
to name and share the partial application.

In MetaOCaml, we may also specialize the power function to a particular value n, obtaining the code which will later receive x and compute x^n. We re-write power n x annotating expressions as computed `now' (when n is known) or `later' (when x is given).

    let rec spower n x =
      if n = 0 then .<1>.
      else if n mod 2 = 0 then .<square .~(spower (n/2) x)>.
      else .<.~x * .~(spower (n-1) x)>.
    (* val spower : int -> int code -> int code = <fun> *)
The two annotations, or staging constructs, are brackets .< e >. and escape .~e . Brackets .< e >. `quasi-quote' the expression e, annotating it as computed later. Escape .~e, which must be used within brackets, tells that e is computed now but produces the result for later. That result, the code, is spliced-in into the containing bracket. The inferred type of spower is different. The result is no longer an int, but int code -- the code of expressions that compute an int. The type of spower spells out which argument is received now, and which later: the future-stage argument has the code type. There is an alternative syntax for brackets and escapes, as extension nodes. For example, spower above can also be written as
    let rec spower n x =
      if n = 0 then [%metaocaml.bracket 1]
      else if n mod 2 = 0 then [%metaocaml.bracket square [%metaocaml.escape spower (n/2) x]]
      else [%metaocaml.bracket [%metaocaml.escape x] * [%metaocaml.escape spower (n-1) x]]
    (* val spower : int -> int code -> int code = <fun> *)
Depending on the editor and its set-up, this form may actually be easier to enter and display.

To specialize spower to 7, we define

    let spower7_code = .<fun x -> .~(spower 7 .<x>.)>.;;
    (*
    val spower7_code : (int -> int) code = .<
       fun x_1 -> x_1 * (csp_square_3 (x_1 * (csp_square_2 (x_1 * 1))))>. 
     *)
and obtain the code of a function that will receive x and return x^7. Code, even of functions, can be printed, which is what MetaOCaml toplevel just did. The print-out contains so-called cross-stage persistent values, or CSP, which `quote' present-stage values such as square to be used later. One may think of CSP as references to `external libraries' -- only in our case the program acts as a library for the code it generates. If square were defined in a separate file, say, sq.ml, then its occurrence in spower7_code would have been printed simply as Sq.square.

To use thus specialized x^7 now, in our code, we should `run' spower7_code, applying the function Runcode.run : 'a code -> 'a. The function compiles the code and links it back to our program.

    open Runcode;;
    let spower7 = run spower7_code
    (* val spower7 : int -> int = <fun> *)

The specialized spower7 has the same type as the partially applied power7 above. They behave identically. However, power7 x will do the recursion on n, checking n's parity, etc. In contrast, the specialized spower7 has no recursion (as can be seen from spower7_code). All operations on n have been done when the spower7_code was generated; the result is spower7 that straight-line multiplies x.

The generated spower7_code is rather simple (as generated code tends to be), and hence can be easily converted to C, via offshoring, described in detail elsewhere. We need to tell however about the square function we defined. Let us assume there exists a C function sqr, for example, defined as static int sqr(int const x) { return x * x; }, and we want to map our square to it.

    let _ = 
      let module M = struct         (* Tell offshoring library to map *)
        open Offshoring             (* our Sq.square to C's sqr       *)
        include DefaultConv
        let id_conv path name = match (path,name) with
        | ("Sq","square") -> "sqr"
        | _ -> id_conv path name
      end
      in
      Offshoring.offshore_to_c ~cnv:(module M) ~name:"power7" spower7_code
The result (which could have been saved into a file) is:
    int power7(int const x_1){
      return (x_1 * sqr(x_1 * sqr(x_1 * 1)));
    }

MetaOCaml supports an arbitrary number of later stages, letting us write not only code generators but also generators of code generators, etc.

References
Reconciling Abstraction with High Performance: A MetaOCaml approach

Walid Taha: A Gentle Introduction to Multi-stage Programming

A.P. Ershov: On the partial computation principle
Information Processing Letters, 1977, v6, N2, pp. 38-41.
This is the English version of the paper that started partial evaluation as a research area. The power example, in Fig. 1 of the paper, can be seen on the following manuscript scan
<http://ershov.iis.nsk.su/files/archive/fold0241/241_75.gif>

The Information Processing Letters paper is the English version of A. P. Ershov: A Theoretical Principle of System Programming
Doklady AN SSSR (Soviet Mathematics Doklady), 1977, v18,N2, pp.312--315.
For space reasons, the power example was omitted from the Doklady AN SSSR paper, but it can be seen in the notes to the November 1976 draft
<http://ershov.iis.nsk.su/files/archive/fold0241/241_17.gif>
This is perhaps the first occurrence of the power function specialization.

For other drafts of this paper, see ``The Ershov Archive for the history of computing''
<http://ershov.iis.nsk.su/archive/eaindex.asp?gid=1480>

MetaOCaml comes with the test suite, directory metalib/test/ in the BER MetaOCaml distribution, which contains many small and a few large examples.

mult.ml [11K]
The multiplication-by-constant example: a more realistic version of the power, illustrating code templates (brackets/escapes), let-insertion, offshoring, real-time specialization with the flavor of JIT, and even multiple-stages.

 

The Design and Implementation of BER MetaOCaml

MetaOCaml is a superset of OCaml extending it with the data type for program code and operations for constructing and executing such typed code values. It has been used for compiling domain-specific languages and automating tedious and error-prone specializations of high-performance computational kernels. By statically ensuring that the generated code compiles and letting us quickly run it, MetaOCaml makes writing generators less daunting and more productive.

The current BER MetaOCaml is a complete re-implementation of the original MetaOCaml by Taha, Calcagno and collaborators. Besides the new organization, new algorithms, new code, BER MetaOCaml adds a scope extrusion check superseding environment classifiers. Attempting to build code values with unbound or mistakenly bound variables (liable to occur due to mutation or other effects) is now caught early, raising an exception with good diagnostics. The guarantee that the generated code always compiles becomes unconditional, no matter what effects were used in generating the code.

We describe BER MetaOCaml stressing the design decisions that made the new code modular and maintainable. We explain the implementation of the scope extrusion check.

References
ber-design.pdf [297K]
The full version of the paper presented at FLOPS 2014. A shorter version is published in the FLOPS Proceedings, Springer's LNCS 8475, pp. 86-102
doi:10.1007/978-3-319-07151-0_6

talk-FLOPS.pdf [196K]
The annotated slides of the talk presented at FLOPS 2014 in Kanazawa, Japan on June 4, 2014.

Let-insertion without pain or fear or guilt

loop_motion_gen.ml [7K]
Loop-invariant code motion with the convenient let-insertion (the main example in the FLOPS 2014 talk)

 

MetaOCaml: Ten Years Later

MetaOCaml is a superset of OCaml for convenient code generation with static guarantees: the generated code is well-formed, well-typed and well-scoped, by construction. Not only the completed generated code always compiles; code fragments with a variable escaping its scope are detected already during code generation. MetaOCaml has been employed for compiling domain-specific languages, generic programming, automating tedious specializations in high-performance computing, generating efficient computational kernels and embedded programming. It is used in education, and served as inspiration for several other metaprogramming systems.

Most well-known in MetaOCaml are the types for values representing generated code and the template-based mechanism to produce such values, a.k.a., brackets and escapes. MetaOCaml also features cross-stage persistence, generating ordinary and mutually-recursive definitions, first-class pattern-matching and heterogeneous metaprogramming.

The extant implementation of MetaOCaml, first presented at FLOPS 2014, has been continuously evolving. We describe the current design and implementation, stressing particularly notable additions. Among them is a new, efficient, the easiest to retrofit translation from typed code templates to code combinators. Scope extrusion detection unexpectedly brought let-insertion, and a conclusive solution to the 20-year--old vexing problem of cross-stage persistence.

References
design-10.pdf [386K]
The paper published in Functional and Logic Programming. FLOPS 2024. Lecture Notes in Computer Science, vol. 14659, pp. 219-236, 2024. Springer
doi:10.1007/978-981-97-2300-3_12

mult.ml [11K]
The running example of the paper, designed to show off various MetaOCaml facilities, not just brackets and escapes.

 

Implementing staging

An attractive approach to implementing a staged language is to add staging to an existing language, to take advantage of existing code generators, parsers, libraries and users. This section describes three ways of adding staging to a typed functional language, the last of which is actually implemented in MetaOCaml.

The most straightforward way to extend a functional language with staging is to add staging forms to the abstract syntax tree (AST) and adjust the parser; add staging forms to the typed AST modifying the type checker accordingly; add staging forms to the intermediate language and account for them in the code generator. We have to modify everything -- we might as well re-implement the language from scratch. But there is a far simpler way.

The simplest approach is to pre-process away the brackets into expressions built with code combinators. For example, the MetaOCaml expression

    fun x -> .<fun y -> .~x * y + 1>.
is to be translated into the following plain OCaml code
    fun x -> lam "y" (fun y -> (add (mul x y) (int 1)))
which, when evaluated, will generate the code.

The code combinators, or primitive code generators, are typed and have the following signature:

    type 'a cod
    val int: int -> int cod
    val add: int cod -> int cod -> int cod
    val mul: int cod -> int cod -> int cod
    val lam: string -> ('a cod -> 'b cod) -> ('a->'b) cod
    val app: ('a->'b) cod -> ('a cod -> 'b cod)

This translation can be done by camlp4 or a stand-alone pre-processor. The base system (OCaml) can be used as it is without any modifications. Best of all, the translation approach is consistent with typing: if the result of pre-processing, which is plain OCaml, type-checks according to the ordinary OCaml rules, the source MetaOCaml code is well-typed according to the MetaOCaml rules. This simple approach works surprisingly well: Scala's Lightweight Modular Staging (LMS) is based on similar ideas. Scheme's implementation of quasi-quote is also quite similar; only it does not process lambda- or let-forms in any special way.

There are complications, for example, dealing with special forms such as conditionals and loops. We have to introduce thunks. Pattern-matching forms and type annotations pose a nasty problem, which can be hacked away, e.g., by translating to deconstructors and special type annotation functions (like Haskell's asTypeOf). The paper Scala-Virtualized talks about such translations in detail. Alas, the polymorphic let like .<let f = fun x -> x in ...>. is (was!) a show-stopper. The let-binding within brackets becomes after translation the lambda-binding, which cannot be polymorphic. Although OCaml supports first-class polymorphism, it requires type annotations -- which means the translation cannot be a macro expansion/preprocessing done before the type-checking. Polymorphism in general is problematic in this simple approach: since a code value is translated to an expression, a polymorphic code value (e.g., let f = .<fun x -> x>.) will be translated into the monomorphic expression, due to the value restriction.

The remaining approach is to translate the staging away after type-checking. For example, fun x -> .<fun y -> .~x * y + 1>. is first type-checked, according to the modified OCaml rules to account for brackets and escapes. It is then translated to

    fun x -> build_fun_simple "y" (fun y ->
               mkApp (mkIdent "+")
                  [mkApp (mkIdent "*") [x; y];
                   mkConst 1])
The future-stage code is type-checked properly, including polymorphic constructs. The brackets and escapes do get translated away, but during the type checking. A future-stage bound variable of type t becomes the present-stage bound variable y, but at the type t code.

This is the approach taken in MetaOCaml. With staging annotations eliminated after the type checking, the OCaml back-end, the optimizer and the code generator, are used as they are. We must however modify the OCaml type checker to keep track of the stage (bracketing) level and to type check within brackets. For the former, we make a global reference cell for the current level, `incremented' and `decremented' when encountering a bracket or an escape. Type-checking the future-stage code is almost the same as the present-stage code with one exception. Identifiers bound by future-stage binding forms should be annotated with their stage. BER N102 does this annotation by attaching an attribute to the value_description, to tell the staging level of the value.

The result of a code-generating expression is a value of the type 'a cod. A code value could represent bits of low-level code; however, they are very difficult to compose. For example, depending on the context .<1>. may end up being a part of a machine instruction or a part of the data segment, with different representations. Since the generated code is assuredly well-typed, it does not have to be type-checked. Code combinators could therefore produce bits of the post-typechecking intermediate language. This language (type-annotated AST -- in OCaml, Typedtree) is also difficult to compose: maintaining the type annotations and their type environment is a chore. MetaOCaml code can generate arbitrarily many let-expressions. If code combinators produce the type-annotated AST, they have to extend the type environment with freshly bound identifiers and their types. If the generated let expressions are polymorphic, code generators should be refreshing the type variables. Realizing code values as an untyped AST is the simplest. AST can be pretty-printed to the surface syntax and then compiled as any other source code.

References
A surprisingly simple implementation of staging
The outlined simple approach does actually work, and could be used to implement a practical two-stage language. Even the problem of generating polymorphic let-bindings is solved -- but only very recently.

New translation from nested quasiquotes to code-generation combinators: efficient, typed, easier to retrofit
It is implemented since BER N114.

 

Data constructor restriction

Algebraic data types are one of the salient features of OCaml, which, alas, have not been considered in staged calculi. The theory therefore gives no guidance on staging the code that contains constructors of algebraic, user defined data types, such as the following:
    type foo = Foo | Bar of int
    .<let x = Foo in match x with Bar z -> z>.
The generated code, which can be stored in a file, is let x = Foo in match x with Bar z -> z. Compiling this file will fail since Foo and Bar are not declared. The problem is how to put a data type declaration into the generated code, which is syntactically an expression and hence cannot contain declarations.

The old MetaOCaml dealt with the problem by modifying the AST representing the generated code and adding the field for declarations (actually, the entire type environment). Such a change sent ripples of modifications throughout the type checker, and was one of the main reasons for the divergence of MetaOCaml from OCaml, which contributed to MetaOCaml's demise.

BER MetaOCaml since N100 eschews such AST modification, imposing a constructor restriction instead: all data constructors and record labels used within brackets must come from the types that are declared in separately compiled modules. The restriction follows from the observation that OCaml has no problem compiling code such as Some 1 with constructors of the (standard) data types. The following expressions all satisfy the restriction:

    .<true>.
    .<raise Not_found>.
    .<Some [1]>.
    .<{Complex.re = 1.0; im = 2.0}>.
    let open Complex in .<{re = 1.0; im = 2.0}>.
    .<let open Complex in {re = 1.0; im = 2.0}>.
because the data types bool, option, list, Complex.t are either in Stdlib or defined in the (separately compiled) standard library. External type declarations like those of Complex.t are found in complex.cmi, which can be looked up when the generated code is compiled. On the other hand, the following fail the restriction and are rejected by BER MetaOCaml:
    type foo = Bar
    .<Bar>.

    module Foo = struct exception E end
    .<raise Foo.E>.
The type declaration foo or the module declaration Foo must be moved into an interface file and compiled separately. The compiled interface must also be available at run-time: either placed into the same directory as the executable, or somewhere within the OCaml library search path -- as if it were the standard library type.

However there is a way to include type declarations within the Parsetree.expression tree. For example,

    type foo = Foo | Bar of int
    .<let x = Foo in match x with Bar z -> z>.
could produce the following generated code
    let module M = 
      struct
        type foo = Foo | Bar of int
      end
    in let open M in
    let x = Foo in match x with Bar z -> z
which compiles without problems. Local modules in OCaml do effectively let declarations appear in expressions. The generated code becomes the closure over the data type environment. Therefore, the constructor restriction will be lifted in some future release. On the other hand, from the early experience the restriction does not seem to be very limiting, or hard to satisfy.

 

Moving free variables around but not letting them escape

MetaOCaml lets us manipulate open code. This section describes the complexities and trade-offs in making sure all free variables in such code will eventually be bound, by their intended binders. BER MetaOCaml, since N101, reverses the choice of its predecessors and trades a type-level check for a stronger and more informative dynamic scope-extrusion check.

In MetaOCaml, program code is first-class: it can be passed as arguments to functions, returned as their result, stored in mutable cells, thrown in exceptions, etc. That code may contain free variables, which must however occur in the typing environment in order for the MetaOCaml program to type check. One may think that free variables will, therefore, eventually be bound and the generated code will be closed. Running it should never fail because of an unbound variable. Alas, a well-typed BER MetaOCaml program may still attempt, when executed, to run an open code or construct ill-scoped code -- the code with a free variable that `escaped' its binder and hence will remain unbound or, worse, bound accidentally. In BER MetaOCaml, these attempts are detected early, aborting the execution of the generator with an informative error message. Previous versions of MetaOCaml used type system, so-called environment classifiers, to prevent some, rarely occurring, attempts to run open code. Alas, the other attempts went undetected and ill-scoped code was generated. Not any more.

Manipulating open code is overshadowed by two dangers. First, the mere presence of the run operation in the language brings the risk of applying it to the code that we have not yet finished constructing. Here is the characteristic example:

    .<fun x y -> .~(let z = run .<x+1>. in .<z>.)>.
Prior to BER N101, MetaOCaml rejected such code at type checking time, with the help of so-called environment classifiers (Taha and Nielsen, POPL 2003). Back then, the type of an int code value was ('c,int) code, where 'c is the classifier. If the code value is closed, the classifier is generalizable. The running example was rejected with the type error:
    .<fun x y -> .~(let z = .! .<x+1>. in .<z>.)>.;;
                               ^^^^^^^
    .! error: 'a not generalizable in ('a, int) code
(prior to BER N101, the operation to run code was spelled .!).

BER type checks the running example. Executing the resulting generator however aborts with the run-time exception:

    .<fun x y -> .~(let z = run .<x+1>. in .<z>.)>.;;
    Exception:
    Failure
     "The code built at Characters 29-32:
      .<fun x y -> .~(let z = run .<x+1>. in .<z>.)>.;;
                                   ^^^
     is not closed: identifier x_1 bound at Characters 6-7:
      .<fun x y -> .~(let z = run .<x+1>. in .<z>.)>.;;
            ^
    is free".
The error message is informative, telling now the name of the free variable and pointing out, in the generator source code, the binder that should have bound the variable.

Moving open code around may result in ill-scoped code in another way, when the open code is stored in mutable cells, thrown in exceptions or captured in delimited continuations. Alas, environment classifiers cannot not detect such errors and bad things did happen. Here is an example using old MetaOCaml from 2006 (version 3.09.1 alpha 030). (The problem can be illustrated simpler, but the following example is more realistic and devious.)

    let c =
     let r = ref .<fun z->z>. in 
     let f = .<fun x -> .~(r := .<fun y -> x>.; .<0>.)>. in 
     .<fun x -> .~f (.~(!r) 1)>. ;;
    (*
    val c : ('a, '_b -> int) code =
      .<fun x_4 -> ((fun x_2 -> 0) ((fun y_3 -> x_2) 1))>.
    *)

One must look hard to see that x_2 is actually unbound in the resulting code (shown in comments, as the output from the interpreter). So-called scope extrusion happened. Only when we attempt to run that code we see that something went wrong somewhere.

    .! c;;
    (*
    Characters 77-78:
      Unbound value x_2

    Exception: Trx.TypeCheckingError.
    *)

Since we get an error anyway (without much diagnostics though), one may discount the severity of the problem. Alas, sometimes scope extrusion results in no error -- just in wrong results. A simple variation plasters over the problem:

    let c1 =
     let r = ref .<fun z->z>. in 
     let _ = .<fun x -> .~(r := .<fun y -> x>.; .<0>.)>. in 
     !r;;
    (*
    val c1 : ('a, '_b -> '_b) code = .<fun y_3 -> x_2>.
    *)
We then use c1 to construct the code c2:
    let c2 = .<fun y -> fun x -> .~c1>.;;
    (*
    val c2 : ('a, 'b -> 'c -> '_d -> '_d) code =
      .<fun y_1 -> fun x_2 -> fun y_3 -> x_2>.
    *)
which contains no unbound variables and can be run without problems.
    (.! c2) 1 2 3;;
    (* - : int = 2 *)

It is most likely that the user did not intend for fun x -> in c2 to bind x of c1. This is the blatant violation of lexical scope. And yet we get no error or other overt indication that something went wrong.

BER MetaOCaml has none of this. Although the type system still permits code values with escaped variables, attempting to use such code in any way -- splice, print or run -- immediately raises an exception. For example, entering the expression c above in the top-level BER MetaOCaml gives:

    Exception:
    Failure
     "Scope extrusion detected at Characters 127-137:
           .<fun x -> .~f (.~(!r) 1)>. ;;
                          ^^^^^^^^^^
     for code built at Characters 80-90:
           let f = .<fun x -> .~(r := .<fun y -> x>.; .<0>.)>. in
                                        ^^^^^^^^^^
     for the identifier x_2 bound at Characters 65-66:
           let f = .<fun x -> .~(r := .<fun y -> x>.; .<0>.)>. in
                         ^".
Although the expression c type checked, its evaluation was aborted with an exception and so no code was constructed. Prior to BER N100, c successfully evaluated to the code with an unbound variable. Not only the scope extrusion is detected now; it is reported with an informative message, telling which variable got away, which part of the generator was supposed to bind it, and where it eloped. In the second example, although c1 evaluates successfully (although printing the result reports scope extrusion), splicing c1 immediately aborts any further code generation. The exception again tells the name of the escaped variable, x, and the place in the source code of the generator where that variable was supposed to be bound. No ill-scoped code is generated as the result.

Let us review the trade-offs. Prior to N101, MetaOCaml relied on environment classifiers to prevent, at type-checking time, only one class of errors: attempts to run not yet constructed code. Environment classifiers do not stop scope extrusion and the generation of ill-scoped code. From experience, the error the classifiers do prevent is rare (partly because the complex typing of run made it difficult to use except at the top level, at which point the code is always fully constructed). The extra type parameter of the code type proved to be a nuisance: every data structure that may contain code has to have a type with the environment classifier. Generator abstractions were cumbersome to define and use. Environment classifiers often gave false positives, because the value restriction prevented the generalization of the classifier.

The scope-extrusion check, since BER N101, is dynamic -- it acts when the generator is run rather than when it is compiled. Replacing a type error with a dynamic error is regrettable. On the up side, the generator stops before producing bad code, throwing an exception with a fairly detailed and helpful error message, pointing out the variable that got away. Since the error is an exception, we can obtain the exception stack backtrace and figure out exactly which part of the generator let that variable leak. Before BER N100, we discover the scope-extrusion problem, in the best case, only when compiling the generated code and being taken aback by a compilation error. It could be quite a challenge in figuring out which part of the generator was to blame. The scope-extrusion check requires no extension to the type system and so is easier to implement and maintain.

Thus BER MetaOCaml N101 removed environment classifiers since they give good protection (a type error) against only rare errors, while being cumbersome always. The scope extrusion check, albeit dynamic, is comprehensive and informative, ensuring well-formedness even in the presence of effects. To repeat, since N100, BER MetaOCaml unconditionally guarantees that if code is successfully generated and can be shown or saved to the file, etc -- the code is well-typed and well-scoped. It shall compile without type errors.

 

Many ways to run the code

BER MetaOCaml is structured as the `kernel' and the `user-level'. The former type-checks and compiles staging annotations. Handling the generated code is up to user-level libraries. For example, metalib in the BER MetaOCaml distribution prints, byte- and natively-compiles code values. User-level MetaOCaml code can be changed and extended without recompiling MetaOCaml. Furthermore, changes in OCaml will have less effect on the user-level code.

BER MetaOCaml comes with the `kernel' interface.

    type 'a code                            (* abstract: possibly open code *)
    type 'a closed_code = private Parsetree.expression
   
    val close_code : 'a code -> 'a closed_code
    val open_code  : 'a closed_code -> 'a code

The latter is total, the former does a scope extrusion check. Since closed code is essentially OCaml AST, after closing the generated code, the user may examine and `run' it in many ways. One way of running the code is printing it. The user-level Codelib library comes with the following interface:

    val print_code         : Format.formatter -> 'a code -> unit
    val print_closed_code  : Format.formatter -> 'a closed_code -> unit
   
    val format_code : Format.formatter -> 'a closed_code -> unit
   
    val print_code_as_ast : 'a closed_code -> unit

The former two functions pretty-print the code; print_code is the composition of close_code and print_closed_code. These two functions are installed as printers of code values by the MetaOCaml toplevel. The function format_code is like print_closed_code, but omits the outer brackets. It comes handy when saving the generated code into a file, to compile later. The last function prints the closed code value as a parse tree, for debugging.

To execute the code, the user-level MetaOCaml currently provides val run_bytecode : 'a closed_code -> 'a or val run_native : 'a closed_code -> 'a to run closed code by bytecode compiling (viz. natively compiling) it and then executing. The function Runcode.run : 'a code -> 'a is the composition of close_code and run_bytecode or run_native.

More run-like functions are easy to add: they are regular functions to be linked with or loaded at the toplevel as any other library functions. There is no need to recompile MetaOCaml any more. The old MetaOCaml had so-called `off-shoring' to translate the generated code to C and Fortran. For example, it was used to generate FFT kernels. The generated C code was good enough to plug as it was into the FFTW benchmarking and testing framework. During the overhaul the offshoring has been left for clean-up and re-writing. It is now being developed as a separate library. Since offshoring and many such transformations are type-directed, for their benefit MetaOCaml provides typecheck_code : 'a closed_code -> Typedtree.expression to type check the generated code. Since version N107, offshoring back, in a more general way, as the translation from a subset of the (generated) OCaml code to a simple imperative intermediate language, which may then be transcribed to C, OpenCL, LLVM, etc.

 

Native (natively-compiled) MetaOCaml

Since version N104, BER MetaOCaml provides native compilation of the generated code, if the generator itself is natively compiled (with metaocamlopt). Natively-compiled generated code is then linked back into the running program -- hence supporting run-time code specialization.

It should be stressed that run-time code specialization is only one way of using MetaOCaml. The other is actually more prominent, being used in three largish projects: stream library with the performance of the hand-written code; generating the optimal code for FFT kernels and for Gaussian Elimination. Gaussian Elimination is a large family of the algorithms, parameterized by the domain (int, float, polynomial), by the choice of pivoting, by the need to compute the determinant, rank, the permutation matrix, by the layout of matrices and by a half a dozen more aspects. Because Gaussian Elimination is often used in inner loops, the performance matters a great deal. Therefore, all this parameterization has to be done statically. With Jacques Carette we generated many GE procedures for particular parameter choices. The generated code was stored in files, to be compiled in a library, by ocamlopt. Although the code generator was byte-code OCaml, the generated code could be stored and compiled by the optimizing, native OCaml. We used the byte-code run, but only for testing. The FFT project was similar: the generator was byte-code, the byte-code run was used for testing, the generated code was C and compiled with the Intel C compiler.

To stress, the byte-code generator produces the code to be compiled by any suitable compiler. The mode (byte-code or native) of the generator and of the generated code are not related.

References
Stream Fusion, to Completeness

Generating optimal FFT code and relating FFTW and Split-Radix

Multi-stage programming with functors and monads: eliminating abstraction overhead from generic code

 

More brackets

MetaOCaml brackets .<e>. as well as escapes .~e are internally represented as extension nodes metaocaml.bracket and metaocaml.escape, resp. That is, .<e>. is just the syntactic sugar for [%metaocaml.bracket e].

There are also two bracket-related attributes. They are meant to be attached to a bracket expression. The attributes, together with the bracket expression they are attached to, may be regarded as a special sort of bracket. Currently we provide no sugar for these `special' brackets (to be less disruptive of the baseline OCaml syntax). Only one of the following bracket-modifying attributes may be attached to a bracket.

The first bracket-modifying attribute is metaocaml.functionliteral. It is an assertion that the bracketed expression is actually a literal function, such as .<function p1 -> e1 ... pn -> en>. or .<fun p -> e>. Such function literals act as first-class patterns. The type-checker checks that the bracketed expression is really the function literal, and if so, gives it a more refined type: pat_code. For example:

    let t1 = .<fun x -> x>. [@metaocaml.functionliteral]
     (* val t1 : ('a -> 'a) Trx.pat_code = <abstr> *)
   
    let t2 = .<function [] -> true | (_::_) -> false>. [@metaocaml.functionliteral]
    (* val t : ('a list -> bool) Trx.pat_code = <abstr> *)
   
    let t3 = .<let x = function () -> () in x>. [@metaocaml.functionliteral]
   
      let t3 = .<let x = function () -> () in x>. [@metaocaml.functionliteral];;
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   Error: The expression does not appear to be a functional literal as requested
The comments show the inferred type. Attaching the attribute to a code expression that does not look as a function literal raises the type error, shown in the last example, for t3. For more detail, see the section on first-class pattern-matching.

The other bracket-modifying attribute is metaocaml.value. It is also an assertion, that a bracketed expression actually represents a value: a constant, variable reference, constant tuple, etc. If this is indeed the case, the typechecker gives the bracketed expression a more refined type: val_code. Otherwise, an error is reported. For more detail and example, see the section on let-insertion below.

 

First-class pattern-matching

MetaOCaml N104 and later versions can generate match ... with ... expressions with the statically unknown number of clauses. This is the first step towards building arbitrary patterns at run-time. This first-class pattern-matching has not been seen before, in any of the earlier versions of MetaOCaml or MetaML. Although the low-level Template Haskell (outside the convenient quasi-quotations) and Scheme also let us build pattern-matching expressions dynamically, they offer no guarantees whatsoever that the result makes any sense. MetaOCaml does. The match statement built by make_match below is assured to be well-formed and type-correct. Rhiger's pattern-combinators give type-safety as well. However, their types are extremely complicated. A mistake in using the combinator gives nearly incomprehensible error messages. In contrast, first-class pattern-matching of MetaOCaml is staggeringly simple.

First-class pattern-matching in MetaOCaml is utmost simple because OCaml already has the convenient syntax for pattern-matching clauses. Indeed, the familiar function literal such as .<function [1] -> 0>. is a pattern-match clause. We only need to ensure that a code value of a function type is indeed a literal (which is assured by the metaocaml.functionliteral annotation), and collect such literals to make a match statement. Here is an example:

    (* defined in codelib.mli *)
   val make_match : 'a code -> ('a -> 'w) pat_code list -> 'w code
   
   let p =
     [.<function [1] -> 0>.   [@metaocaml.functionliteral];
      .<fun [x]      -> x+1>. [@metaocaml.functionliteral]
     ]
   
   let c = make_match .<[2]>.  @@
     [.<fun []       -> 0>.   [@metaocaml.functionliteral] ] @
     p @
     [.<fun (x::y)   -> x+1>. [@metaocaml.functionliteral] ]
which generates the following match expression:
    val c : int code = .<
      match [2] with
      | [] -> 0
      | 1::[] -> 0
      | x_14::[] -> x_14 + 1
      | x_12::y_13 -> x_12 + 1>. 

For more examples, see the pattern.ml file in the test directory of the metalib library. What is left is the generation of statically unknown patterns, like matching on a statically unknown constants. Actually, this is already possible, in a way, through when guards (see the sample code mentioned earlier).

References
Morten Rhiger: Type-Safe Pattern Combinators
J. Functional Programming, v19, March 2009, pp. 145-156

 

Let-insertion as a primitive

The importance of let-insertion is hard to overstate. It was to implement let-insertion that the delimited control library delimcc was originally developed. Let-insertion has been the main motivation for combining staging with effects, such as delimited control. We refer to those papers for more discussion.

Given the importance and prevalence of let-insertion, it seemed fitting to offer it natively, as a MetaOCaml primitive. We no longer need external dependencies such as delimited control or effect libraries. MetaOCaml already tracks free variables of an expression, for the sake of scope-extrusion detection. Let-insertion is not that complicated to add (although not as simple either, due to nesting).

Since version N104, BER MetaOCaml supports let-insertion natively, without any external libraries (and without delimited control). First, it introduces the type 'a val_code, a subtype of 'a code, which represents program fragments that are syntactically values (such as an identifier reference, a constant, a lambda-expression). Such fragments can be freely spliced as many times as needed without worrying about duplicating effects or evaluation order. Code-generator--functions that take arguments of 'a val_code type behave as call-by-value functions when the staging annotations are erased. One can always convert, or coerce, val_code to code.

One way of producing 'a val_code values is using brackets with the metaocaml.value attribute, as in

    fun x -> .<(1,function y -> y + .~x)>. [@metaocaml.value]
   (* inferred type: int code -> (int * (int -> int)) val_code *)
If the annotated bracketed expression is not actually a value, syntactically, a type error is raised.

Another way of producing 'a val_code is by

    genletv : 'a code -> 'a val_code
This function first checks that its argument is already a value, whose construction requires no significant allocations (i.e., cheap to duplicate). If so, it is returned as it is, but of the 'a val_code type. Otherwise, genlet generates a let-statement, binding the argument to a fresh future-stage variable and returning the code that refers to that variable. Variable references are clearly values. That is, genlet exp inserts let freshname = exp in ... somewhere in the generated code and returns .<freshname>.. The let is inserted right under the binder for the `latest' free variable contained in exp. For example,
    .<fun x y -> x + y + .~(genlet .<x+1>.)>.
generates
    - : (int -> int -> int) code = .<
     fun x_1  -> let t_3 = x_1 + 1  in fun y_2  -> (x_1 + y_2) + t_3>.
(Here, genlet is the composition of genletv with the coercion from the 'a val_code result of the latter to 'a code, so that it can be spliced in.)

The names of let-bound-variables are at the discretion of genlet; one could still give a hint, through the optional ~name argument:

    .<fun x y -> x + y + .~(genlet ~name:"v" .<x+1>.)>.
which generates fun x_1 -> let v_3 = x_1 + 1 in fun y_2 -> (x_1 + y_2) + v_3>.

Version N111 lets the user control where the let-statement is to be inserted, which is often necessary when the expression to let-bind and move is effectful. For example, consider

    let sum_up = 
      let body arr =
        let arr = genlet arr in
        let sum = genlet ~name:"sum" .<ref 0>. in
        .<for i=0 to Array.length .~arr - 1 do
           .~sum := ! .~sum + (.~arr).(i)
          done;
        ! .~sum>.
      in .<fun x -> .~(body .<Array.map succ x>.)>.
which results in
    val sum_up : (int array -> int) code = .<
      let sum_6 = Stdlib.ref 0 in
      fun x_4 ->
        let t_5 = Stdlib.Array.map Stdlib.succ x_4 in
        for i_7 = 0 to (Stdlib.Array.length t_5) - 1 do
          sum_6 := ! sum_6 + Stdlib.Array.get t_5 i_7
        done;
        ! sum_6>.
where the accumulator sum_6 is bound outside the function. Hence it accumulates across all invocations of sum_up, which is probably not what the programmer intended. The simple modification:
    let sum_up = 
      let body arr =
        with_locus @@ fun locus ->
        let arr = genlet arr in
        let sum = genlet ~name:"sum" ~locus .<ref 0>. in
        .<for i=0 to Array.length .~arr - 1 do
           .~sum := ! .~sum + (.~arr).(i)
          done;
        ! .~sum>.
      in .<fun x -> .~(body .<Array.map succ x>.)>.
keeps the accumulator within the function:
    val sum_up : (int array -> int) code = .<
      fun x_8 ->
        let t_10 = Stdlib.Array.map Stdlib.succ x_8 in
        let sum_11 = Stdlib.ref 0 in
        for i_12 = 0 to (Stdlib.Array.length t_10) - 1 do
          sum_11 := ! sum_11 + Stdlib.Array.get t_10 i_12
        done;
        ! sum_11>.
The form with_locus specifies the upper bound for the scope of genlet ~locus bindings; the bindings may be inserted sooner to avoid scope extrusion.

Finally, BER N111 offers the facility for the `immediate' let-insertion:

    let t = letl ~name:"v" .<1+2>. (fun v -> .<4 + .~v>.)
which generates .<let v_3 = 1 + 2 in 4 + v_3>.. It is essentially genlet that immediately follows with_locus. As with the ordinary genlet, if the expression to bind is simple (is a value whose construction does not involve big memory allocations), no let is generated.

For more examples, see genlet.ml in the test directory of the metalib library. The file fib.ml gives a larger example, of memoization and let-insertion.

References
Shifting the Stage: Staging with Delimited Control

 

Generating mutually recursive definitions

BER N111 brings let rec insertion: the facility to generate groups of mutually recursive bindings of statically unknown size. Such groups cannot be generated using brackets and escapes: only expressions can be bracketed and spliced, but an individual binding var = exp is not an expression. Therefore, BER N111 introduces two primitives
    type locus_rec
    
    val with_locus_rec : (locus_rec -> 'w code) -> 'w code
    
    val mkgenlet : ?name:string ->          (* hint of names to bind *)
      locus_rec ->                          (* locus created by with_locus_rec *)
      ('key->'key->bool) ->                 (* memo key comparison *)
      (* The result: genletrec, which takes a key and 
           a function to produce the expression to let-bind
      *)
      (('key -> ('a->'b) code) -> 'key -> ('a->'b) code)
Here, with_locus_rec, similarly to with_locus described earlier, marks the scope of the let rec that will eventually be generated. The function mkgenlet produces the `memoizing genletrec'; what it does is better explained on an example.

The example is Neil Jones' benchmark for partial evaluators: specializing the Ackermann function.

    let rec ack m n =
      if m = 0 then n+1 else
      if n = 0 then ack (m-1) 1 else 
      ack (m-1) (ack m (n-1))
The challenge is to produce the code specialized to a particular value of m. Looking at the earlier power example and adding staging annotations produces something like the following:
    let sack m =
      let rec loop m =
        if m = 0 then .<fun n -> n + 1>. else
        .<fun n -> if n = 0 then .~(loop (m-1)) 1
        else .~(loop (m-1)) (.~(loop m) (n-1))>.
      in loop m
Alas, applying this generator to any positive m results in an infinite loop. One clearly sees that for non-zero m, the result of loop m depends on loop m. We need to build a recursive definition; in fact, several definitions, for all m from 0 to the user-given value. That is actually quite easy: use mkgenlet to obtain the `binding producer' g and blindly insert it before every loop invocation:
    let sack m =
      with_locus_rec @@ fun l ->
      let g = mkgenlet l (=) in
      let rec loop m =
        if m = 0 then .<fun n -> n + 1>. else
        .<fun n -> if n = 0 then .~(g loop (m-1)) 1
        else .~(g loop (m-1)) (.~(g loop m) (n-1))>.
      in g loop m
Then sack 2 produces the desired
    val sac2 : (int -> int) code = .<
      let rec h_6 n_7 = n_7 + 1
      and h_4 n_5 = if n_5 = 0 then h_6 1 else h_6 (h_4 (n_5 - 1))
      and h_2 n_3 = if n_3 = 0 then h_4 1 else h_4 (h_2 (n_3 - 1)) in h_2>. 
References
PEPM 2019 and 2022 papers with the detailed discussion, eventually arriving at the interface implemented in MetaOCaml

test/genletrec.ml in the BER MetaOCaml distribution shows many more examples, including the generation of finite automata from their schematic descriptions

 

Cross-stage persistence

Bracketed expressions (i.e., templates of the code to generate) may be open: contain free variables, which are called Cross-Stage Persistent (CSP) variables. Borrowing and modifying the example from Tim Sheard's ``Accomplishments and Research Challenges in Meta-programming'' (SAIG 2001)
    let f x = x + 1
    let z = .<f 4 + 5>.
    let f x = not x in Runcode.run z
The code value bound to z contains the free variable f. It refers to the int->int function that was in scope when z was defined, rather than to f:bool->bool which is in scope when z is run. That is, CSP variables, as other variables, are lexically scoped. CSP are pervasive: for example, the code value z above contains yet another free variable (CSP): it is +, which is bound in Stdlib. All standard library operations are hence CSPs.

Cross-stage persistence may be looked at differently: as an ability to use the same expression, value or identifier at several stages: now and later. For example, Stdlib.succ increments integers now: succ 1. The future-stage expression .<succ 1>. -- which will increment 1 later, when this code is compiled and run -- refers to the same succ. Such a reference is CSP. In this section we describe CSP in the original and BER MetaOCaml, their problems, and their current implementation. Cross-stage persistent expressions, present in some calculi can always be emulated in terms of CSP identifiers such as x in .<.~(let x = e in .<x>.)>.. Therefore, in the following we will be talking only about cross-stage persistent identifiers.

Staged calculi such as lambda^a (Taha and Nielsen, 2003) and lambda_i (Calcagno, Moggi and Taha, 2004) impose no restrictions on CSP. Neither does MetaOCaml, letting us write

    let polylift x = .<x>.;;
      val polylift : 'a -> 'a code = <fun>
Such polymorphic CSPs are well-defined in staged calculi that formalize a small and pure subset of real-world staged languages. Outside that subset, problems emerge. For example, what is the meaning, if any, for CSP of mutable cells or input channels:
    let r = ref 0 in
    let x = run .<let () = r := 1 in !r>. in
    (x, !r)
   
    let c = open_in "/etc/motd" in .<c>.
      in_channel code = .<(* cross-stage persistent value (id: c) *)>.
Is r above shared or copied between the stages? What can a presently-opened file possibly refer to when the the code .<c>. is run later? After all, the generated code may be saved into a file and compiled into a library, to be linked into programs that may run on a different, from the generator, computer. Generating libraries of specialized computational kernels is an important application of MetaOCaml. On the other hand, cross-stage persistence of mutable cells and even of open channels makes sense in the case of run-time code specialization.

It seems important to separate CSP of global identifiers from CSP of local ones. Incidentally, Template Haskell has such a distinction. The expression .<succ>. referring to the global identifier succ seems no different from .<1>. After all, 1 and succ are both constants, of different types. The definition of a calculus or a logic typically begins with the phrase: assume a set Sigma of constants (such as numerals, succ etc.) of appropriate types and arities. In the code

    let x = 1 in
    let y = .<1>. in ...
1 on the first line refers to the constant of Sigma (``the standard library constant'', so to speak) from the present-stage calculus and 1 on the second line is the element of Sigma at the next stage. The code
    let x = succ in
    let y = .<succ>. in ...
should be interpreted the same way. It is crucial that Sigma at both stages be the same. In other words, the computer that runs the generator and the computer that runs the generated code must have the same standard library, which is broadly understood to include user libraries and .cmi and .cmo files. Thus CSP of global identifiers should be understood as references to the standard library. In other words, such CSPs are common knowledge, present at all stages.

CSP of a local identifier should be interpreted as a lift (serialization), valid only at specific types. Hence, let x = 1 in .<x+2>. should be interpreted as let x = 1 in .<.~(lift_int x) +2>.. Here lift_int is an internal function that serializes an integer value. Serialization always copies the value. What if the value is non-serializable or hardly serializable? And how to add lift-like functions, say, for user-defined data types? These are very difficult questions, which are finally answered.

First of all, `global' variables (bound outside the current compilation unit) in code values are interpreted as references to those `library' identifiers. There are no restrictions on their types. The following concerns `local' CSPs: identifiers in code values bound in the current computation unit:

In other words, easily serializable are interpreted by value (copied); all others, by reference.

Since N107 MetaOCaml also provides a set of lifting modules of the signature

    module type lift = sig
      type t
      val lift: t -> t code
    end
to help lift values of option, lift, array types. Users are encouraged to use such lifters in their code if they need to serialize particular CSPs.
References
Reconciling Abstraction with High Performance: A MetaOCaml approach
Section 3.2.1 `A Digression on CSP' of the book describes explicit lifting, which is implemented in N107 and after (including lifting of lists, option values and, in particular, arrays)

 

Further plans

MetaOCaml has an extensive research and development program. On the research front is developing a staged calculus that accounts for objects, modules, and GADTs. Since modules and objects are binding forms, their staging is complex. We must perform appropriate renaming and check that the generated identifiers are not left unbound in the resulting code. Also interesting is the integration of staging and modules (which have recently became first-class). Since both staging and modules are abstractions over code, can staging even be used as a module system? Another item is to prove that the restriction on CSP avoids the unsoundness problems with generalizations.

User-land development is most extensive: we need more ways to `run' code values, by translating them to C, Fortran, LLVM, Verilog. MetaOCaml can then be used for generating libraries of specialized C, etc. code. User-land MetaOCaml development has the least cost of entry (no need to know OCaml internals) and is wide open for developers. Everyone can contribute, everyone is welcome.

 

A brief history of (BER) MetaOCaml

BER MetaOCaml sprang out the original MetaOCaml and owes a great deal to the lessons of its implementation. BER MetaOCaml builds on what has been proven to work. The motivation for it becomes clear from MetaOCaml's history.

A curious file in the original MetaOCaml CVS repository lists several short notes from the meeting in Paris on September 29, 2000 between Xavier Leroy and Walid Taha. One note stands out: ``we (Xavier!) were able to compile and run "run bracket 1"!!'' One may take this moment as the birth of MetaOCaml.

MetaOCaml started as a fork of OCaml along the lines of MetaML. Designed and architectured by Walid Taha and developed by Cristiano Calcagno, MetaOCaml had reached its current form by 2003. The environment classifiers appeared at the end of that year. Native code compilation and offshoring (translating the generated code into C and Fortran) were added in 2004.

As MetaOCaml was being developed, new versions of the mainline OCaml were released with sometimes many fixes and improvements. The MetaOCaml team tracked new OCaml releases and merged the changes into MetaOCaml. (The MetaOCaml version number has as its base OCaml's release version.) The merge was not painless. For example, any new function in the OCaml compiler that dealt with Parsetree (AST) or Typedtree has to be modified to handle MetaOCaml extensions to these data structures. The merge process became more and more painful as the two languages diverged. For instance, native code compilation that first appeared in MetaOCaml 3.07 relied on SCaml, a large set of patches to OCaml by malc at pulsesoft.com to support dynamic linking. OCaml 3.08 brought many changes that were incompatible with SCaml. Therefore, in MetaOCaml 3.08 the native compilation mode was broken. The mode was brought back in the Summer 2005, by re-engineering the SCaml patch and implementing the needed parts of dynamic linking without any modification to the OCaml code. The revived native compilation has survived through the end.

In 2006, MetaOCaml became the basis for Concoqtion. The increasing divergence between OCaml and MetaOCaml made it harder and harder to merge the changes. With the funds for the project dried up and the daunting prospect of merging many changes that appeared in OCaml 3.10 and 3.11, the development of MetaOCaml has ceased. Its last released version was 3.09.1 alpha 030.

Although the original development team could no longer support MetaOCaml, the users of the language, Jacques Carette in particular, could not bear it fall by the wayside. At the Jacques Carette's urging, the merge of OCaml 3.11 changes was undertaken in February 2010. The lack of funding and limited resources made it imperative to keep close to the mainline OCaml and avoid the divergence at all cost. Everything not directly related to staging was removed, including tag elimination and Concoqtion. 23 fewer files in the OCaml distribution were affected by MetaOCaml. Walid Taha suggested to name the new, bare-minimum line of MetaOCaml as BER MetaOCaml. Its first version N002 was released on March 1, 2010.

BER MetaOCaml established the separation of the kernel from the user-level. Like with an OS kernel, changes to the MetaOCaml kernel require `rebooting' (recompiling the MetaOCaml) and may render the system unusable. User-level changes affect only particular programs and are easier to develop. The MetaOCaml kernel is responsible for producing and type-checking code values. User-level deals with processing, or `running' code values: pretty-printing, executing, compiling to byte-code, to native code, to C, etc. Programmers may write new ways of processing code values -- for example, to compile them to LLVM or JavaScript -- without modifying OCaml or BER MetaOCaml. In the BER N002 distribution, the kernel included the 59KB patch to OCaml distribution and the 54KB-long typing/trx.ml; user-level consisted of a pretty-printer (59KB source code) and the support for byte-code execution of the generated code and the custom top level (3KB, all in all). There was also 20KB of test code.

The 59KB patch to OCaml seems large. Part of it is necessary. Since BER MetaOCaml extends the AST data type, Parsetree, with variants Pexp_bracket, Pexp_escape, Pexp_run, Pexp_cspval for the staging constructs, any part of OCaml that processes the AST has to be modified. Quite many parts of a compiler deal with AST. In a multi-stage language, each expression is associated with a stage level, present-stage or one of the progressively future stages. The original MetaOCaml stored the level in an extra field val_level of the value_description record that maintained type and other information for each identifier. Since the map from identifiers to value_description is the typing environment, it is used all throughout the type checker. Therefore, the addition of a new field required great many changes. A new OCaml version modifies the type checker quite heavily. Integrating all these modifications into MetaOCaml, accounting for the new field val_level, is a hard job. It is avoidable however: we may associate identifiers with levels differently, by adding a new map to the environment that tells the level of each future-stage identifier. Any identifier not in the domain of that map is assumed present-stage. The value_description record then remains intact. BER MetaOCaml N100 (see below) takes this route. The original MetaOCaml modified the AST further, to store the information about data constructors used in the generated code (actually, the AST that corresponded to the generated code contained the entire present-stage environment of the type-checker.) Great many patches to OCaml came that way. BER N100 imposed the constructor restriction as described elsewhere on this page, eliminating the extra AST modifications and the need to patch OCaml further. Thus, if one takes as the goal bringing MetaOCaml closer to OCaml, the amount of modifications to OCaml can indeed be minimized -- as BER N100 has shown.

OCaml release 3.12 brought unexpectedly many new features to the language, for example, first-class modules. OCaml 4.0 added GADTs and dynamic linking. Integrating that many changes into BER MetaOCaml N002 was too depressing to contemplate. BER MetaOCaml N100, developed in January 2013 and released on January 31, is a clean-state re-implementation of MetaOCaml. Rather than merging OCaml 4.0 into the MetaOCaml branch, BER N100 started anew. It took the OCaml distribution, extended its AST with the staging constructs, and kept adding code until the constructs were fully handled -- until there were no inexhaustive pattern-matching throughout the OCaml code. The exhaustiveness check was very helpful. Great care was taken to modify as fewer OCaml data structures as possible. Here is the break-out of the BER N100 distribution: the kernel consists of the 49KB patch to OCaml files (modifying 28 OCaml files, 6 of which trivially) and the 77KB-long completely re-written typing/trx.ml with very many comments. User-level includes the 58KB pretty-printer (heavily modified by Jacques Carette), and a number of small files to support top-level and run, 3KB all in all. There are 54KB of tests, including, for the first time, a regression test suite. BER N100 is much less invasive into OCaml: compare the size of the patch to the main OCaml type-checker typing/typecore.ml, which contains the bulk of the changes for type checking the staging constructs. In the previous version BER N004, the patch had 564 lines of additions, deletions and context; now, only 328 lines.

BER MetaOCaml N101, released in November 2013, cleaned-up MetaOCaml further. The positive experience with the scope extrusion check has made it possible to remove environment classifiers, and hence notably simplify the type checking of staging constructs. The operation to run the code no longer required dedicated type checking rules. In BER N101, it became an ordinary function, moved out of the MetaOCaml kernel. Internally, BER N101 improved the scope extrusion check and optimized the translation of staging constructs, especially binding forms. Pretty-printing of the code has become part of OCaml and hence no longer has to be maintained separately.

The version N102 released on January 1, 2015 was another significant re-write, although with hardly any user-visible changes. The BER N102 code makes the extensive use of attributes, the new feature of OCaml 4.02. Staging annotations -- brackets, escapes and CSP -- are now really annotations: attributes on the Parsetree and Typedtree. A specifically Typedtree attribute marks non-expansive nodes (the non-expansiveness check is performed before the bracket-translation but is used only after). An attribute on value_description tells the staging level of the value. There is no longer a separate Typedtree traversal pass, after the type checking, to translate brackets and escapes. That means that for staging-annotation-free code, MetaOCaml has no substantial overhead. BER N102 has started on revamping cross-stage-persistence; quite a few CSP have become printable and, mainly, serializable. Non-serializable CSP were the only impediment to native MetaOCaml. The number of files in the OCaml distribution that are patched by MetaOCaml fell down from 32 to mere 7. The patch size is reduced to 34KB. It is now a distinct possibility that MetaOCaml becomes just a regular library or a plug-in, rather being a fork.

Version N104 (January 1, 2017) brings back native MetaOCaml and implements two novel features, not seen in MetaOCaml or MetaML before: first-class pattern-matching and let-insertion as a primitive.

Version N107 (October 5, 2018) brings back offshoring, in a different way, and adds explicit lifting. The code type is no longer special and pre-defined (which means only 4 OCaml files are modified by MetaOCaml: lexer, parser, pretty-printer, and typecore.ml).

Version N111 (October 5, 2020) adds let rec insertion: the facility to generate mutually recursive definitions whose size is not statically known. It is now possible to control the scope of let-insertion, which is often necessary when the let-bound expression is effectful. The well-scopedness guarantee is still maintained. Offshoring got more polish.

Version N114 (May 11, 2023) re-implements offshoring, adding, among other things, normalization (removing @@ and |>, straightening nested let-bindings, lifting let-bindings were possible and easy). Mutable variables and pointer types are supported with no restrictions. Offshoring is now supported completely, up to printing of C99 C code. The offshoringIR intermediate language is usable outside MetaOCaml (for generating C or WASM, etc.) CSP are re-implemented: non-serializable CSPs are converted to top-level let-bindings. We now use Parsetree extension nodes metaocaml.bracket and metaocaml.escape. The lexer is adaptable now: it lexes .> exactly as OCaml, until it sees the first .< (opening bracket).

Developing and especially maintaining a new language is gigantic work. Engineering a new language as a dialect of a well-maintained one is hard but pays in many ways.

References
The MetaOCaml files: Status report and research proposal
The extended abstract published in the informal proceedings of the 2010 ACM SIGPLAN Workshop on ML

ml2013-talk.pdf [200K]
MetaOCaml lives on: Lessons from implementing a staged dialect of a functional language
The annotated slides of the talk at the ACM SIGPLAN Workshop on ML, September 22, 2013. Boston, MA, USA

The Design and Implementation of BER MetaOCaml

 

Credits

BER MetaOCaml is the complete re-implementation of the original MetaOCaml.

The original MetaOCaml was funded primarily by the NSF project ``ITR/SY(CISE): Putting Multi-Stage Annotations to Work'' led by Walid Taha. Most of the original development and implementation of staging was done by Cristiano Calcagno. Edward Pizzi has implemented the pretty-printing of code, later modified Jacques Carette. Xavier Leroy, INRIA, helped with the compiler specifics.

I am very grateful to Jacques Carette, Walid Taha, Jeremy Yallop, Yukiyoshi Kameyama and the members of his lab at the University of Tsukuba, Chung-chieh Shan, Cedric Cellier and Jun Inoue for many helpful discussions and encouragement. Reports and testing by Tran Minh Quang, Mustafa Elsheik, and Nicolas Ojeda Bar are appreciated.