MetaOCaml has been successfully used for the most optimal stream fusion, generating fast parsers and regular-expression matchers, specializing numeric and dynamic programming algorithms, building FFT kernels, compilers for an image processing and database query DSLs, OCaml server pages, generating families of specialized basic linear algebra and Gaussian Elimination routines, and high-performance stencil computations.
MetaOCaml is distinguished from Camlp4 and other such macro-processors by: hygiene (maintaining lexical scope); generating assuredly well-typed code; and the integration with higher-order functions, modules and other abstraction facilities of ML, hence promoting modularity and reuse of code generators. A well-typed BER MetaOCaml program generates only well-typed programs: The generated code shall compile without type errors. There are no longer problems of puzzling out a compilation error in the generated code (which is typically large, obfuscated and with unhelpful variable names). The errors are reported in terms of the generator rather than the generated code.
MetaOCaml is purely generative: the generated code is treated as a black box and cannot be examined. One can put code together but cannot take it apart. Pure generativity significantly simplifies the type system and strengthens the static assurances. It may seem that pure generativity precludes code optimizations. Fortunately, that is not the case. The literature has many examples of generating optimal code with MetaOCaml.
BER MetaOCaml is in the spirit of the original MetaOCaml by Taha, Calcagno and contributors but has been completely re-written using different algorithms and techniques. The old MetaOCaml code still runs with little or no changes. Overall, BER MetaOCaml aims at the most harmonious way of integrating staging with OCaml, minimizing the differences. It becomes easier to keep MetaOCaml up-to-date and ensure its long-term viability.
BER MetaOCaml has been re-structured to minimize the amount of changes
to the OCaml type-checker and to separate the `kernel' from the
`user-level'. The kernel is a set of patches and additions to OCaml,
responsible for producing and type-checking code values. The
processing of built code values -- so-called `running' -- is
user-level. Currently the user-level metalib
supports printing,
type-checking, byte- and native- compiling and linking of code
values. In addition, the generated code can be `offshored' -- converted to
the intermediate, imperative language which may then be transcribed to
C, OpenCL, LLVM, etc.
BER MetaOCaml is almost entirely backwards compatible with the original MetaOCaml. To the user, the major differences of BER MetaOCaml are as follows (see below on this page for the detailed explanation):
Smaller visible differences are special `subtypes' for the code that represents a function literal or a value; more helpful error messages; better printing of cross-stage persistent values and the full support for labeled arguments. A long-standing problem with records with polymorphic fields has fixed itself. The BER MetaOCaml code is now extensively commented, and has a regression test suite. MetaOCaml has never supported generating code with local or first-class modules and objects. However one may use modules and objects in the code generator.
By relying on attributes, the feature of OCaml 4.02, BER N102 has
become much closer integrated with OCaml (and BER N114 is more
still). It is instructive to compare the amount of changes BER
MetaOCaml makes to the OCaml distribution. An earlier version (BER
N101) modified 32 OCaml files. BER N114 modifies only 4 (that
number could be further reduced to one; the only file with
nontrivial modifications is typing/typecore.ml
). It is now a
distinct possibility that -- with small hooks that may be provided in
the future OCaml versions -- MetaOCaml becomes just a regular library
or a plug-in, rather being a fork.
The file NOTES.txt in the BER MetaOCaml distribution describes the features of BER MetaOCaml in more detail and outlines directions for further development. The R&D program is extensive, ranging from the staging theory to the developing of translators of the generated code to C, LLVM, and other targets. There is fun and tasks for everyone. Hopefully the release of BER MetaOCaml would incite using and researching typed meta-programming.
The Design and Implementation of BER MetaOCaml
Stream Fusion, to Completeness
The latest, most notable application
Reconciling Abstraction
with High Performance: A MetaOCaml approach
The MetaOCaml book
ml2013-talk.pdf [200K]
MetaOCaml lives on:
Lessons from implementing a staged dialect of a functional language
The annotated slides of the talk at the ACM SIGPLAN Workshop on ML
September 22, 2013. Boston, MA, USA
$ opam update $ opam switch 4.14.1+BER $ eval `opam config env`If you use Tuareg, the following options are reported to be useful:
Tuareg group (setq tuareg-support-metaocaml t) Tuareg Faces group Tuareg Font Lock Multistage Face
One may also install BER MetaOCaml manually. Befitting the separation
between the kernel and the user-level, BER MetaOCaml is distributed as
a set of patches to OCaml plus a separate library metalib
. The
installation involves patching the OCaml distribution, building the
OCaml system system as usual, and compiling metalib
and the BER
MetaOCaml toplevel. The patched OCaml compiler is fully source- and
binary-compatible with OCaml and can process OCaml code, with staging
constructs or without. MetaOCaml can also use OCaml-compiled modules
and libraries as they are. BER MetaOCaml is available as:
x^n
. (See also
mult.ml
for a more realistic and illustrative example.)
In OCaml:
let square x = x * x let rec power n x = if n = 0 then 1 else if n mod 2 = 0 then square (power (n/2) x) else x * (power (n-1) x) (* val power : int -> int -> int = <fun> *)(The comment underneath the code shows the reply of the MetaOCaml toplevel.)
Suppose our program has to compute x^7
many times. We may define
let power7 x = power 7 x (* val power7 : int -> int = <fun> *)to name and share the partial application.
In MetaOCaml, we may also specialize the power function to a
particular value n
, obtaining the code which will later receive x
and compute x^n
. We re-write power n x
annotating expressions as
computed `now' (when n
is known) or `later' (when x
is given).
let rec spower n x = if n = 0 then .<1>. else if n mod 2 = 0 then .<square .~(spower (n/2) x)>. else .<.~x * .~(spower (n-1) x)>. (* val spower : int -> int code -> int code = <fun> *)The two annotations, or staging constructs, are brackets
.< e >.
and
escape .~e
. Brackets .< e >.
`quasi-quote' the expression e
,
annotating it as computed later. Escape .~e
, which must be used within
brackets, tells that e
is computed now but produces the result for
later. That result, the code, is spliced-in into the containing bracket.
The inferred type of spower
is different. The result is no longer an
int
, but int code
-- the code of expressions that compute an int
.
The type of spower
spells out which argument is received now, and which
later: the future-stage argument has the code type. There is an
alternative syntax for brackets and escapes, as extension nodes. For
example, spower
above can also be written as
let rec spower n x = if n = 0 then [%metaocaml.bracket 1] else if n mod 2 = 0 then [%metaocaml.bracket square [%metaocaml.escape spower (n/2) x]] else [%metaocaml.bracket [%metaocaml.escape x] * [%metaocaml.escape spower (n-1) x]] (* val spower : int -> int code -> int code = <fun> *)Depending on the editor and its set-up, this form may actually be easier to enter and display.
To specialize spower
to 7, we define
let spower7_code = .<fun x -> .~(spower 7 .<x>.)>.;; (* val spower7_code : (int -> int) code = .< fun x_1 -> x_1 * (csp_square_3 (x_1 * (csp_square_2 (x_1 * 1))))>. *)and obtain the code of a function that will receive
x
and return
x^7
. Code, even of functions, can be printed, which is what MetaOCaml
toplevel just did. The print-out contains so-called cross-stage persistent
values, or CSP, which `quote' present-stage values such as square
to
be used later. One may think of CSP as references to `external
libraries' -- only in our case the program acts as a library for the
code it generates. If square
were defined in a separate file, say,
sq.ml
, then its occurrence in spower7_code
would have been printed
simply as Sq.square
.
To use thus specialized x^7
now, in our code, we should `run'
spower7_code
, applying the function Runcode.run : 'a code -> 'a
.
The function compiles the code and links it back to our program.
open Runcode;; let spower7 = run spower7_code (* val spower7 : int -> int = <fun> *)
The specialized spower7
has the same type as the partially applied
power7
above. They behave identically. However, power7 x
will do
the recursion on n
, checking n
's parity, etc. In contrast, the specialized
spower7
has no recursion (as can be seen from spower7_code
). All
operations on n
have been done when the spower7_code
was generated;
the result is spower7
that straight-line multiplies x
.
The generated spower7_code
is rather simple (as generated code tends
to be), and hence can be easily converted to C, via
offshoring, described in detail elsewhere. We need to tell
however about the square
function we defined. Let us
assume there exists a C function sqr
, for example, defined as
static int sqr(int const x) { return x * x; }
, and we want to map
our square
to it.
let _ = let module M = struct (* Tell offshoring library to map *) open Offshoring (* our Sq.square to C's sqr *) include DefaultConv let id_conv path name = match (path,name) with | ("Sq","square") -> "sqr" | _ -> id_conv path name end in Offshoring.offshore_to_c ~cnv:(module M) ~name:"power7" spower7_codeThe result (which could have been saved into a file) is:
int power7(int const x_1){ return (x_1 * sqr(x_1 * sqr(x_1 * 1))); }
MetaOCaml supports an arbitrary number of later stages, letting us write not only code generators but also generators of code generators, etc.
Walid Taha: A Gentle Introduction to Multi-stage Programming
A.P. Ershov: On the partial computation principle
Information Processing Letters, 1977, v6, N2, pp. 38-41.
This is the English version of the paper that started partial
evaluation as a research area. The power example, in Fig. 1 of the paper,
can be seen on the following manuscript scan
<http://ershov.iis.nsk.su/files/archive/fold0241/241_75.gif>
The Information Processing Letters paper is the English version of
A. P. Ershov: A Theoretical Principle of System Programming
Doklady AN SSSR (Soviet Mathematics Doklady), 1977, v18,N2,
pp.312--315.
For space reasons, the power example was omitted from the
Doklady AN SSSR paper, but it can be seen in the notes to
the November 1976 draft
<http://ershov.iis.nsk.su/files/archive/fold0241/241_17.gif>
This is perhaps the first occurrence of the power function specialization.
For other drafts of this paper, see
``The Ershov Archive for the history of computing''
<http://ershov.iis.nsk.su/archive/eaindex.asp?gid=1480>
MetaOCaml comes with the test suite, directory
metalib/test/
in the BER MetaOCaml distribution,
which contains many small and a few large examples.
mult.ml [11K]
The multiplication-by-constant example:
a more realistic version of the power,
illustrating code templates (brackets/escapes), let-insertion,
offshoring, real-time specialization with the flavor of JIT,
and even multiple-stages.
The current BER MetaOCaml is a complete re-implementation of the original MetaOCaml by Taha, Calcagno and collaborators. Besides the new organization, new algorithms, new code, BER MetaOCaml adds a scope extrusion check superseding environment classifiers. Attempting to build code values with unbound or mistakenly bound variables (liable to occur due to mutation or other effects) is now caught early, raising an exception with good diagnostics. The guarantee that the generated code always compiles becomes unconditional, no matter what effects were used in generating the code.
We describe BER MetaOCaml stressing the design decisions that made the new code modular and maintainable. We explain the implementation of the scope extrusion check.
talk-FLOPS.pdf [196K]
The annotated slides of the talk presented at FLOPS 2014 in
Kanazawa, Japan on June 4, 2014.
Let-insertion without pain or fear or guilt
loop_motion_gen.ml [7K]
Loop-invariant code motion with the convenient let-insertion
(the main example in the FLOPS 2014 talk)
Most well-known in MetaOCaml are the types for values representing generated code and the template-based mechanism to produce such values, a.k.a., brackets and escapes. MetaOCaml also features cross-stage persistence, generating ordinary and mutually-recursive definitions, first-class pattern-matching and heterogeneous metaprogramming.
The extant implementation of MetaOCaml, first presented at FLOPS 2014, has been continuously evolving. We describe the current design and implementation, stressing particularly notable additions. Among them is a new, efficient, the easiest to retrofit translation from typed code templates to code combinators. Scope extrusion detection unexpectedly brought let-insertion, and a conclusive solution to the 20-year--old vexing problem of cross-stage persistence.
mult.ml [11K]
The running example of the paper, designed to show off various MetaOCaml
facilities, not just brackets and escapes.
The most straightforward way to extend a functional language with staging is to add staging forms to the abstract syntax tree (AST) and adjust the parser; add staging forms to the typed AST modifying the type checker accordingly; add staging forms to the intermediate language and account for them in the code generator. We have to modify everything -- we might as well re-implement the language from scratch. But there is a far simpler way.
The simplest approach is to pre-process away the brackets into expressions built with code combinators. For example, the MetaOCaml expression
fun x -> .<fun y -> .~x * y + 1>.is to be translated into the following plain OCaml code
fun x -> lam "y" (fun y -> (add (mul x y) (int 1)))which, when evaluated, will generate the code.
The code combinators, or primitive code generators, are typed and have the following signature:
type 'a cod val int: int -> int cod val add: int cod -> int cod -> int cod val mul: int cod -> int cod -> int cod val lam: string -> ('a cod -> 'b cod) -> ('a->'b) cod val app: ('a->'b) cod -> ('a cod -> 'b cod)
This translation can be done by camlp4 or a stand-alone pre-processor. The base system (OCaml) can be used as it is without any modifications. Best of all, the translation approach is consistent with typing: if the result of pre-processing, which is plain OCaml, type-checks according to the ordinary OCaml rules, the source MetaOCaml code is well-typed according to the MetaOCaml rules. This simple approach works surprisingly well: Scala's Lightweight Modular Staging (LMS) is based on similar ideas. Scheme's implementation of quasi-quote is also quite similar; only it does not process lambda- or let-forms in any special way.
There are complications, for example, dealing with special forms such
as conditionals and loops. We have to introduce thunks.
Pattern-matching forms and type annotations pose a nasty problem,
which can be hacked away, e.g., by translating to deconstructors and
special type annotation functions (like Haskell's asTypeOf
). The paper
Scala-Virtualized talks about such translations in detail. Alas, the
polymorphic let
like .<let f = fun x -> x in ...>.
is
(was!) a show-stopper.
The let-binding within brackets becomes after translation
the lambda-binding, which cannot be polymorphic.
Although OCaml supports first-class polymorphism, it requires type
annotations -- which means the translation cannot be a macro
expansion/preprocessing done before the type-checking.
Polymorphism in
general is problematic in this simple approach: since a code value
is translated to
an expression, a polymorphic code value (e.g., let f = .<fun x -> x>.
) will be translated into the monomorphic expression, due to
the value restriction.
The remaining approach is to translate the staging away after
type-checking. For example, fun x -> .<fun y -> .~x * y + 1>.
is first
type-checked, according to the modified OCaml rules to account for
brackets and escapes. It is then translated to
fun x -> build_fun_simple "y" (fun y -> mkApp (mkIdent "+") [mkApp (mkIdent "*") [x; y]; mkConst 1])The future-stage code is type-checked properly, including polymorphic constructs. The brackets and escapes do get translated away, but during the type checking. A future-stage bound variable of type
t
becomes the present-stage
bound variable y
, but at the type t code
.
This is the approach taken in MetaOCaml. With staging annotations eliminated
after the type checking, the OCaml back-end, the
optimizer and the code generator, are used as they are. We must however
modify the OCaml type checker to keep track of the stage (bracketing)
level and to type check within brackets. For the former, we make a global
reference cell for the current level, `incremented' and `decremented'
when encountering a bracket or an escape. Type-checking the future-stage
code is almost the same as the present-stage code with one exception.
Identifiers bound by future-stage binding forms should be annotated
with their stage. BER N102 does this annotation by attaching an attribute to
the value_description
, to tell the staging level of the value.
The result of a code-generating expression is a value of
the type 'a cod
. A code value could represent
bits of low-level code; however, they
are very difficult to compose. For example, depending on the context
.<1>.
may end up being a part of a machine instruction or a part of
the data segment, with different representations. Since the
generated code is assuredly well-typed, it does not have to be
type-checked. Code combinators could therefore produce bits of the
post-typechecking intermediate language. This
language (type-annotated AST -- in OCaml, Typedtree
) is
also difficult to compose: maintaining the type annotations and their
type environment is a chore. MetaOCaml code can generate
arbitrarily many let
-expressions. If code combinators
produce the type-annotated AST, they have to extend the type
environment with freshly bound identifiers and their types. If
the generated let
expressions are polymorphic, code generators
should be refreshing the type variables.
Realizing code values as an untyped AST is the simplest. AST
can be pretty-printed to the surface syntax and then compiled as
any other source code.
New translation from nested quasiquotes to
code-generation combinators: efficient, typed, easier to retrofit
It is implemented since BER N114.
type foo = Foo | Bar of int .<let x = Foo in match x with Bar z -> z>.The generated code, which can be stored in a file, is
let x = Foo in match x with Bar z -> z
. Compiling this file will fail
since Foo
and Bar
are not declared. The problem is how to put a
data type declaration into the generated code, which is
syntactically an expression and hence cannot contain
declarations.
The old MetaOCaml dealt with the problem by modifying the AST representing the generated code and adding the field for declarations (actually, the entire type environment). Such a change sent ripples of modifications throughout the type checker, and was one of the main reasons for the divergence of MetaOCaml from OCaml, which contributed to MetaOCaml's demise.
BER MetaOCaml since N100 eschews such AST modification, imposing a
constructor restriction instead: all data constructors and record
labels used within brackets must come from the types that are declared
in separately compiled modules. The restriction follows from the
observation that OCaml has no problem
compiling code such as Some 1
with constructors of the (standard) data
types. The following expressions all satisfy the restriction:
.<true>. .<raise Not_found>. .<Some [1]>. .<{Complex.re = 1.0; im = 2.0}>. let open Complex in .<{re = 1.0; im = 2.0}>. .<let open Complex in {re = 1.0; im = 2.0}>.because the data types
bool
, option
, list
, Complex.t
are
either in Stdlib
or defined in the (separately compiled) standard
library. External type declarations like those of Complex.t
are
found in complex.cmi
, which can be looked up when the generated code
is compiled. On the other hand, the following fail the restriction and are
rejected by BER MetaOCaml:
type foo = Bar .<Bar>. module Foo = struct exception E end .<raise Foo.E>.The type declaration
foo
or the module declaration Foo
must be moved
into an interface file and compiled separately. The compiled interface
must also be available at run-time: either placed into the same directory as the
executable, or somewhere within the OCaml library search path -- as if
it were the standard library type.
However there is a way to include type declarations within the
Parsetree.expression
tree. For example,
type foo = Foo | Bar of int .<let x = Foo in match x with Bar z -> z>.could produce the following generated code
let module M = struct type foo = Foo | Bar of int end in let open M in let x = Foo in match x with Bar z -> zwhich compiles without problems. Local modules in OCaml do effectively let declarations appear in expressions. The generated code becomes the closure over the data type environment. Therefore, the constructor restriction will be lifted in some future release. On the other hand, from the early experience the restriction does not seem to be very limiting, or hard to satisfy.
In MetaOCaml, program code is first-class: it can be passed as arguments to functions, returned as their result, stored in mutable cells, thrown in exceptions, etc. That code may contain free variables, which must however occur in the typing environment in order for the MetaOCaml program to type check. One may think that free variables will, therefore, eventually be bound and the generated code will be closed. Running it should never fail because of an unbound variable. Alas, a well-typed BER MetaOCaml program may still attempt, when executed, to run an open code or construct ill-scoped code -- the code with a free variable that `escaped' its binder and hence will remain unbound or, worse, bound accidentally. In BER MetaOCaml, these attempts are detected early, aborting the execution of the generator with an informative error message. Previous versions of MetaOCaml used type system, so-called environment classifiers, to prevent some, rarely occurring, attempts to run open code. Alas, the other attempts went undetected and ill-scoped code was generated. Not any more.
Manipulating open code is overshadowed by two dangers. First, the mere presence of the run operation in the language brings the risk of applying it to the code that we have not yet finished constructing. Here is the characteristic example:
.<fun x y -> .~(let z = run .<x+1>. in .<z>.)>.Prior to BER N101, MetaOCaml rejected such code at type checking time, with the help of so-called environment classifiers (Taha and Nielsen, POPL 2003). Back then, the type of an int code value was
('c,int) code
, where 'c
is the classifier.
If the code value is closed, the classifier is generalizable. The running
example was rejected with the type error:
.<fun x y -> .~(let z = .! .<x+1>. in .<z>.)>.;; ^^^^^^^ .! error: 'a not generalizable in ('a, int) code(prior to BER N101, the operation to run code was spelled
.!
).
BER type checks the running example. Executing the resulting generator however aborts with the run-time exception:
.<fun x y -> .~(let z = run .<x+1>. in .<z>.)>.;; Exception: Failure "The code built at Characters 29-32: .<fun x y -> .~(let z = run .<x+1>. in .<z>.)>.;; ^^^ is not closed: identifier x_1 bound at Characters 6-7: .<fun x y -> .~(let z = run .<x+1>. in .<z>.)>.;; ^ is free".The error message is informative, telling now the name of the free variable and pointing out, in the generator source code, the binder that should have bound the variable.
Moving open code around may result in ill-scoped code in another way, when the open code is stored in mutable cells, thrown in exceptions or captured in delimited continuations. Alas, environment classifiers cannot not detect such errors and bad things did happen. Here is an example using old MetaOCaml from 2006 (version 3.09.1 alpha 030). (The problem can be illustrated simpler, but the following example is more realistic and devious.)
let c = let r = ref .<fun z->z>. in let f = .<fun x -> .~(r := .<fun y -> x>.; .<0>.)>. in .<fun x -> .~f (.~(!r) 1)>. ;; (* val c : ('a, '_b -> int) code = .<fun x_4 -> ((fun x_2 -> 0) ((fun y_3 -> x_2) 1))>. *)
One must look hard to see that x_2
is actually unbound in the resulting
code (shown in comments, as the output from the interpreter). So-called
scope extrusion happened. Only when we attempt to run that code
we see that something went wrong somewhere.
.! c;; (* Characters 77-78: Unbound value x_2 Exception: Trx.TypeCheckingError. *)
Since we get an error anyway (without much diagnostics though), one may discount the severity of the problem. Alas, sometimes scope extrusion results in no error -- just in wrong results. A simple variation plasters over the problem:
let c1 = let r = ref .<fun z->z>. in let _ = .<fun x -> .~(r := .<fun y -> x>.; .<0>.)>. in !r;; (* val c1 : ('a, '_b -> '_b) code = .<fun y_3 -> x_2>. *)We then use
c1
to construct the code c2
:
let c2 = .<fun y -> fun x -> .~c1>.;; (* val c2 : ('a, 'b -> 'c -> '_d -> '_d) code = .<fun y_1 -> fun x_2 -> fun y_3 -> x_2>. *)which contains no unbound variables and can be run without problems.
(.! c2) 1 2 3;; (* - : int = 2 *)
It is most likely that the user did not intend for fun x ->
in c2
to
bind x
of c1
. This is the blatant violation of lexical scope. And yet we
get no error or other overt indication that something went wrong.
BER MetaOCaml has none of this. Although the type system still
permits code values with escaped variables, attempting to use such
code in any way -- splice, print or run -- immediately raises an
exception. For example, entering the expression c
above in the top-level BER
MetaOCaml gives:
Exception: Failure "Scope extrusion detected at Characters 127-137: .<fun x -> .~f (.~(!r) 1)>. ;; ^^^^^^^^^^ for code built at Characters 80-90: let f = .<fun x -> .~(r := .<fun y -> x>.; .<0>.)>. in ^^^^^^^^^^ for the identifier x_2 bound at Characters 65-66: let f = .<fun x -> .~(r := .<fun y -> x>.; .<0>.)>. in ^".Although the expression
c
type checked, its evaluation was aborted with an
exception and so no code was constructed. Prior to BER N100, c
successfully
evaluated to the code with an unbound variable. Not only the scope extrusion
is detected now; it is reported with an informative message,
telling which variable got away, which part of the generator was supposed to
bind it, and where it eloped. In the second example, although c1
evaluates
successfully (although printing the result reports scope extrusion),
splicing c1
immediately aborts any further code generation. The exception
again tells the name of the escaped variable, x
, and the place in the source
code of the generator where that variable was supposed to be bound. No
ill-scoped code is generated as the result.
Let us review the trade-offs. Prior to N101, MetaOCaml relied on
environment classifiers to prevent, at type-checking time, only one class
of errors: attempts to run not yet constructed code. Environment
classifiers do not stop scope extrusion and the generation of
ill-scoped code. From experience, the error the classifiers do
prevent is rare (partly because the complex typing of run
made it difficult to use except at the top level, at which point the
code is always fully constructed). The extra type parameter of the
code
type proved to be a nuisance: every data structure that
may contain code has to have a type with the environment classifier.
Generator abstractions were cumbersome to define and use. Environment
classifiers often gave false positives, because the value restriction
prevented the generalization of the classifier.
The scope-extrusion check, since BER N101, is dynamic -- it acts when the generator is run rather than when it is compiled. Replacing a type error with a dynamic error is regrettable. On the up side, the generator stops before producing bad code, throwing an exception with a fairly detailed and helpful error message, pointing out the variable that got away. Since the error is an exception, we can obtain the exception stack backtrace and figure out exactly which part of the generator let that variable leak. Before BER N100, we discover the scope-extrusion problem, in the best case, only when compiling the generated code and being taken aback by a compilation error. It could be quite a challenge in figuring out which part of the generator was to blame. The scope-extrusion check requires no extension to the type system and so is easier to implement and maintain.
Thus BER MetaOCaml N101 removed environment classifiers since they give good protection (a type error) against only rare errors, while being cumbersome always. The scope extrusion check, albeit dynamic, is comprehensive and informative, ensuring well-formedness even in the presence of effects. To repeat, since N100, BER MetaOCaml unconditionally guarantees that if code is successfully generated and can be shown or saved to the file, etc -- the code is well-typed and well-scoped. It shall compile without type errors.
metalib
in the BER MetaOCaml distribution prints, byte- and natively-compiles code
values. User-level MetaOCaml code can be changed and extended without
recompiling MetaOCaml. Furthermore, changes in OCaml will have less
effect on the user-level code.
BER MetaOCaml comes with the `kernel' interface.
type 'a code (* abstract: possibly open code *) type 'a closed_code = private Parsetree.expression val close_code : 'a code -> 'a closed_code val open_code : 'a closed_code -> 'a code
The latter is total, the former does a scope extrusion check.
Since closed code is essentially OCaml AST, after closing the generated code,
the user may examine and `run' it in many ways.
One way of running the code is printing it. The user-level Codelib
library comes with the following interface:
val print_code : Format.formatter -> 'a code -> unit val print_closed_code : Format.formatter -> 'a closed_code -> unit val format_code : Format.formatter -> 'a closed_code -> unit val print_code_as_ast : 'a closed_code -> unit
The former two functions pretty-print the code; print_code
is the composition of close_code
and print_closed_code
. These two
functions are installed as printers of code values by the MetaOCaml
toplevel. The function
format_code
is like print_closed_code
, but omits the outer brackets.
It comes handy when saving the generated code into a file,
to compile later. The last function prints the closed code value
as a parse tree, for debugging.
To execute the code, the user-level MetaOCaml currently provides
val run_bytecode : 'a closed_code -> 'a
or
val run_native : 'a closed_code -> 'a
to run closed code by bytecode compiling (viz. natively compiling)
it and then executing.
The function Runcode.run : 'a code -> 'a
is the composition of
close_code
and run_bytecode
or run_native
.
More run-like functions are easy to add: they are regular functions to
be linked with or
loaded at the toplevel as any other library functions.
There is no need to recompile MetaOCaml
any more. The old MetaOCaml had so-called `off-shoring' to translate the
generated code to C and Fortran. For example, it was used to
generate FFT kernels. The generated C code was good enough to plug as
it was into the FFTW benchmarking and testing framework. During the
overhaul the offshoring has been left for clean-up and
re-writing. It is now being developed as a separate library.
Since offshoring and many such transformations are type-directed,
for their benefit MetaOCaml provides
typecheck_code : 'a closed_code -> Typedtree.expression
to type check the generated code.
Since version N107, offshoring back, in a more general way, as the translation
from a subset of the (generated) OCaml code to a simple imperative
intermediate language, which may then be transcribed to C, OpenCL,
LLVM, etc.
It should be stressed that run-time code specialization is only one
way of using MetaOCaml. The other is actually more prominent, being used
in three largish projects: stream library with the performance of the
hand-written code; generating the optimal code for
FFT kernels and for Gaussian Elimination. Gaussian Elimination is a
large family of the algorithms, parameterized by the domain (int
,
float
, polynomial), by the choice of pivoting, by the need to compute
the determinant, rank, the permutation matrix, by the layout of matrices and
by a half a dozen more aspects. Because Gaussian Elimination is often
used in inner loops, the performance matters a great deal. Therefore,
all this parameterization has to be done statically. With Jacques
Carette we generated many GE procedures for particular parameter
choices. The generated code was stored in files, to be compiled in a
library, by ocamlopt
. Although the code generator was
byte-code OCaml, the generated code could be stored and compiled by
the optimizing, native OCaml. We used the byte-code run, but only for
testing. The FFT project was similar: the generator was byte-code, the
byte-code run was used for testing, the generated code was C and
compiled with the Intel C compiler.
To stress, the byte-code generator produces the code to be compiled by any suitable compiler. The mode (byte-code or native) of the generator and of the generated code are not related.
Generating optimal FFT code and relating FFTW and Split-Radix
Multi-stage programming with functors and monads: eliminating abstraction overhead from generic code
.<e>.
as well as escapes .~e
are internally represented as extension nodes metaocaml.bracket
and metaocaml.escape
, resp. That is,
.<e>.
is just the syntactic sugar for [%metaocaml.bracket e]
.
There are also two bracket-related attributes. They are meant to be attached to a bracket expression. The attributes, together with the bracket expression they are attached to, may be regarded as a special sort of bracket. Currently we provide no sugar for these `special' brackets (to be less disruptive of the baseline OCaml syntax). Only one of the following bracket-modifying attributes may be attached to a bracket.
The first bracket-modifying attribute is metaocaml.functionliteral
.
It is an assertion that the bracketed expression is actually a literal
function, such as .<function p1 -> e1 ... pn -> en>.
or .<fun p -> e>.
Such function literals act as first-class patterns. The type-checker
checks that the bracketed expression is really the function literal,
and if so, gives it a more refined type: pat_code
. For example:
let t1 = .<fun x -> x>. [@metaocaml.functionliteral] (* val t1 : ('a -> 'a) Trx.pat_code = <abstr> *) let t2 = .<function [] -> true | (_::_) -> false>. [@metaocaml.functionliteral] (* val t : ('a list -> bool) Trx.pat_code = <abstr> *) let t3 = .<let x = function () -> () in x>. [@metaocaml.functionliteral] let t3 = .<let x = function () -> () in x>. [@metaocaml.functionliteral];; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Error: The expression does not appear to be a functional literal as requestedThe comments show the inferred type. Attaching the attribute to a code expression that does not look as a function literal raises the type error, shown in the last example, for
t3
. For more detail, see the section on
first-class pattern-matching.
The other bracket-modifying attribute is metaocaml.value
.
It is also an assertion, that a bracketed expression actually
represents a value: a constant, variable reference, constant tuple, etc.
If this is indeed the case, the typechecker gives the bracketed
expression a more refined type: val_code
. Otherwise, an error is reported.
For more detail and example, see the section on let-insertion below.
match ... with ...
expressions with the statically unknown number of clauses. This is the
first step towards building arbitrary patterns at run-time. This
first-class pattern-matching has not been seen before, in any of the
earlier versions of MetaOCaml or MetaML. Although the low-level
Template Haskell (outside the convenient quasi-quotations) and Scheme
also let us build pattern-matching expressions dynamically, they offer
no guarantees whatsoever that the result makes any sense. MetaOCaml
does. The match
statement built by make_match
below is assured to
be well-formed and type-correct. Rhiger's pattern-combinators give
type-safety as well. However, their types are extremely complicated. A
mistake in using the combinator gives nearly incomprehensible error
messages. In contrast, first-class pattern-matching of MetaOCaml is
staggeringly simple.
First-class pattern-matching in MetaOCaml is utmost simple because OCaml
already has the convenient syntax for pattern-matching clauses. Indeed,
the familiar function literal such as .<function [1] -> 0>.
is
a pattern-match clause. We only need to ensure that a code value
of a function type is indeed a literal (which is assured by the
metaocaml.functionliteral
annotation), and collect such literals to make
a match
statement. Here is an example:
(* defined in codelib.mli *) val make_match : 'a code -> ('a -> 'w) pat_code list -> 'w code let p = [.<function [1] -> 0>. [@metaocaml.functionliteral]; .<fun [x] -> x+1>. [@metaocaml.functionliteral] ] let c = make_match .<[2]>. @@ [.<fun [] -> 0>. [@metaocaml.functionliteral] ] @ p @ [.<fun (x::y) -> x+1>. [@metaocaml.functionliteral] ]which generates the following
match
expression:
val c : int code = .< match [2] with | [] -> 0 | 1::[] -> 0 | x_14::[] -> x_14 + 1 | x_12::y_13 -> x_12 + 1>.
For more examples, see the pattern.ml
file in the test
directory of
the metalib
library. What is left is the generation of statically unknown
patterns, like matching on a statically unknown constants. Actually,
this is already possible, in a way, through when
guards (see the
sample code mentioned earlier).
Given the importance and prevalence of let-insertion, it seemed fitting to offer it natively, as a MetaOCaml primitive. We no longer need external dependencies such as delimited control or effect libraries. MetaOCaml already tracks free variables of an expression, for the sake of scope-extrusion detection. Let-insertion is not that complicated to add (although not as simple either, due to nesting).
Since version N104, BER MetaOCaml supports let-insertion natively,
without any external libraries (and without delimited control).
First, it introduces the type 'a val_code
, a subtype
of 'a code
, which represents program fragments that are syntactically values
(such as an identifier reference, a constant, a lambda-expression).
Such fragments can be freely spliced as many times as
needed without worrying about duplicating effects or evaluation order.
Code-generator--functions that take arguments of 'a val_code
type behave as call-by-value functions when the staging annotations are
erased. One can always convert, or coerce, val_code
to code
.
One way of producing 'a val_code
values is using brackets with
the metaocaml.value
attribute, as in
fun x -> .<(1,function y -> y + .~x)>. [@metaocaml.value] (* inferred type: int code -> (int * (int -> int)) val_code *)If the annotated bracketed expression is not actually a value, syntactically, a type error is raised.
Another way of producing 'a val_code
is by
genletv : 'a code -> 'a val_codeThis function first checks that its argument is already a value, whose construction requires no significant allocations (i.e., cheap to duplicate). If so, it is returned as it is, but of the
'a val_code
type. Otherwise, genlet
generates a let-statement, binding the argument to a fresh
future-stage variable and returning the code that refers to that
variable. Variable references are clearly values. That is, genlet exp
inserts let freshname = exp in ...
somewhere in the generated
code and returns .<freshname>.
. The let
is inserted right under
the binder for the `latest' free variable contained in exp
. For
example,
.<fun x y -> x + y + .~(genlet .<x+1>.)>.generates
- : (int -> int -> int) code = .< fun x_1 -> let t_3 = x_1 + 1 in fun y_2 -> (x_1 + y_2) + t_3>.(Here,
genlet
is the composition of genletv
with the coercion from
the 'a val_code
result of the latter to 'a code
, so that it can be
spliced in.)
The names of let-bound-variables are at the discretion of genlet
; one
could still give a hint, through the optional ~name
argument:
.<fun x y -> x + y + .~(genlet ~name:"v" .<x+1>.)>.which generates
fun x_1 -> let v_3 = x_1 + 1 in fun y_2 -> (x_1 + y_2) + v_3>.
Version N111 lets the user control where the let
-statement is to
be inserted, which is often necessary when the expression to let-bind and
move is effectful. For example, consider
let sum_up = let body arr = let arr = genlet arr in let sum = genlet ~name:"sum" .<ref 0>. in .<for i=0 to Array.length .~arr - 1 do .~sum := ! .~sum + (.~arr).(i) done; ! .~sum>. in .<fun x -> .~(body .<Array.map succ x>.)>.which results in
val sum_up : (int array -> int) code = .< let sum_6 = Stdlib.ref 0 in fun x_4 -> let t_5 = Stdlib.Array.map Stdlib.succ x_4 in for i_7 = 0 to (Stdlib.Array.length t_5) - 1 do sum_6 := ! sum_6 + Stdlib.Array.get t_5 i_7 done; ! sum_6>.where the accumulator
sum_6
is bound outside the function. Hence it
accumulates across all invocations of sum_up
, which is probably not
what the programmer intended. The simple modification:
let sum_up = let body arr = with_locus @@ fun locus -> let arr = genlet arr in let sum = genlet ~name:"sum" ~locus .<ref 0>. in .<for i=0 to Array.length .~arr - 1 do .~sum := ! .~sum + (.~arr).(i) done; ! .~sum>. in .<fun x -> .~(body .<Array.map succ x>.)>.keeps the accumulator within the function:
val sum_up : (int array -> int) code = .< fun x_8 -> let t_10 = Stdlib.Array.map Stdlib.succ x_8 in let sum_11 = Stdlib.ref 0 in for i_12 = 0 to (Stdlib.Array.length t_10) - 1 do sum_11 := ! sum_11 + Stdlib.Array.get t_10 i_12 done; ! sum_11>.The form
with_locus
specifies the upper bound for the scope of
genlet ~locus
bindings; the bindings may be inserted sooner to
avoid scope extrusion.
Finally, BER N111 offers the facility for the `immediate' let-insertion:
let t = letl ~name:"v" .<1+2>. (fun v -> .<4 + .~v>.)which generates
.<let v_3 = 1 + 2 in 4 + v_3>.
. It is essentially
genlet
that immediately follows with_locus
. As with the ordinary
genlet
, if the expression to bind is simple (is a value whose
construction does not involve big memory allocations), no let
is
generated.
For more examples, see genlet.ml
in the test
directory of
the metalib
library. The file fib.ml
gives a larger example,
of memoization and let-insertion.
var = exp
is not an expression. Therefore, BER
N111 introduces two primitives
type locus_rec val with_locus_rec : (locus_rec -> 'w code) -> 'w code val mkgenlet : ?name:string -> (* hint of names to bind *) locus_rec -> (* locus created by with_locus_rec *) ('key->'key->bool) -> (* memo key comparison *) (* The result: genletrec, which takes a key and a function to produce the expression to let-bind *) (('key -> ('a->'b) code) -> 'key -> ('a->'b) code)Here,
with_locus_rec
, similarly to with_locus
described
earlier, marks the scope of the let rec
that will
eventually be generated. The function mkgenlet
produces the
`memoizing genletrec'; what it does is better explained on an
example.
The example is Neil Jones' benchmark for partial evaluators: specializing the Ackermann function.
let rec ack m n = if m = 0 then n+1 else if n = 0 then ack (m-1) 1 else ack (m-1) (ack m (n-1))The challenge is to produce the code specialized to a particular value of
m
. Looking at the earlier power
example and adding
staging annotations produces something like the following:
let sack m = let rec loop m = if m = 0 then .<fun n -> n + 1>. else .<fun n -> if n = 0 then .~(loop (m-1)) 1 else .~(loop (m-1)) (.~(loop m) (n-1))>. in loop mAlas, applying this generator to any positive
m
results in an
infinite loop. One clearly sees that for non-zero m
, the result of
loop m
depends on loop m
. We need to build a recursive definition;
in fact, several definitions, for all m
from 0 to the user-given value.
That is actually quite easy: use mkgenlet
to obtain the `binding
producer' g
and blindly insert it before every loop
invocation:
let sack m = with_locus_rec @@ fun l -> let g = mkgenlet l (=) in let rec loop m = if m = 0 then .<fun n -> n + 1>. else .<fun n -> if n = 0 then .~(g loop (m-1)) 1 else .~(g loop (m-1)) (.~(g loop m) (n-1))>. in g loop mThen
sack 2
produces the desired
val sac2 : (int -> int) code = .< let rec h_6 n_7 = n_7 + 1 and h_4 n_5 = if n_5 = 0 then h_6 1 else h_6 (h_4 (n_5 - 1)) and h_2 n_3 = if n_3 = 0 then h_4 1 else h_4 (h_2 (n_3 - 1)) in h_2>.
test/genletrec.ml
in the BER MetaOCaml distribution shows many
more examples, including the generation of finite automata from their
schematic descriptions
let f x = x + 1 let z = .<f 4 + 5>. let f x = not x in Runcode.run zThe code value bound to
z
contains the free variable f
. It refers
to the int->int
function that was in
scope when z
was defined, rather than to f:bool->bool
which
is in scope when z
is run. That is, CSP variables, as other
variables, are lexically scoped. CSP are pervasive: for example, the
code value z
above contains yet another free variable (CSP): it is
+
, which is bound in Stdlib. All standard library operations are
hence CSPs.
Cross-stage persistence may be looked at differently: as an
ability to use the same expression,
value or identifier at several stages: now and later. For example,
Stdlib.succ
increments integers now: succ 1
. The future-stage
expression .<succ 1>.
-- which will increment 1 later, when this code is
compiled and run -- refers to the same succ
. Such a reference is
CSP. In this section we describe CSP in the original
and BER MetaOCaml, their problems, and their current implementation.
Cross-stage
persistent expressions, present in some calculi can
always be emulated in terms of CSP identifiers such as x
in
.<.~(let x = e in .<x>.)>.
. Therefore, in the following we will be
talking only about cross-stage persistent identifiers.
Staged calculi such as lambda^a (Taha and Nielsen, 2003) and lambda_i (Calcagno, Moggi and Taha, 2004) impose no restrictions on CSP. Neither does MetaOCaml, letting us write
let polylift x = .<x>.;; val polylift : 'a -> 'a code = <fun>Such polymorphic CSPs are well-defined in staged calculi that formalize a small and pure subset of real-world staged languages. Outside that subset, problems emerge. For example, what is the meaning, if any, for CSP of mutable cells or input channels:
let r = ref 0 in let x = run .<let () = r := 1 in !r>. in (x, !r) let c = open_in "/etc/motd" in .<c>. in_channel code = .<(* cross-stage persistent value (id: c) *)>.Is
r
above shared or copied between the stages? What can a
presently-opened file possibly refer to when the the code .<c>.
is
run later? After all, the generated code may be saved
into a file and compiled into a library, to be linked into programs
that may run on a different, from the generator, computer. Generating
libraries of specialized computational kernels is an important
application of MetaOCaml. On the other hand, cross-stage persistence
of mutable cells and even of open channels makes sense in the case
of run-time code specialization.
It seems important to separate CSP of global
identifiers from CSP of local ones. Incidentally, Template
Haskell has such a distinction.
The expression .<succ>.
referring to the global identifier
succ
seems no different from .<1>.
After all, 1
and succ
are both
constants, of different types. The definition of a calculus or a logic
typically begins with the phrase: assume a set Sigma of
constants (such as numerals,
succ
etc.) of appropriate types and arities. In the code
let x = 1 in let y = .<1>. in ...
1
on the first line refers to the constant of Sigma (``the standard
library constant'', so to speak) from the present-stage calculus and
1
on the second line is the element of Sigma
at the next stage. The code
let x = succ in let y = .<succ>. in ...should be interpreted the same way. It is crucial that Sigma at both stages be the same. In other words, the computer that runs the generator and the computer that runs the generated code must have the same standard library, which is broadly understood to include user libraries and
.cmi
and .cmo
files. Thus CSP of global identifiers
should be understood as references to the standard library. In other
words, such CSPs are common knowledge, present at all stages.
CSP of a local identifier should be interpreted as a lift
(serialization), valid only at specific types. Hence,
let x = 1 in .<x+2>.
should be interpreted as
let x = 1 in .<.~(lift_int x) +2>.
. Here lift_int
is an internal
function that serializes an integer value. Serialization always
copies the value. What if the value is non-serializable or hardly
serializable? And how to add lift-like functions, say, for
user-defined data types? These are very difficult questions, which are
finally answered.
First of all, `global' variables (bound outside the current compilation unit) in code values are interpreted as references to those `library' identifiers. There are no restrictions on their types. The following concerns `local' CSPs: identifiers in code values bound in the current computation unit:
x
in .< ... x ... >.
has the particular
type t
(integer, float, string, boolean, character and a few
others), the CSP occurrence is interpreted as if it were
.< ... .~(lift_t x) ... >.
.
.< ... .~(dyn_quote x) ... >.
, where dyn_quote
is an internal
MetaOCaml function that tries to convert a value to the code of that
value. The function examines the run-time value, without
knowing its type.
.<Obj.magic 0>.
. One
should keep in mind that 0
, None
, []
, false
, etc. are all
represented as integer 0 at run-time: hence the need for magic.
dyn_quote
generates what is essentially a `closure': a pair of the value and
the code .<csp_x>.
where csp_x
is a fresh identifier assumed to be
bound to the value when the code is run.
The bytecode run
does effect such binding. Therefore, it supports
CSPs at all types. The native-code run
does not currently support
such `code closures' (but may in the future, if there is a demand for
it). When a code value is printed, the values to which csp_x
identifiers are supposed to be bound are not shown (because
they are not serializable and, hence, non-printable).
Since N107 MetaOCaml also provides a set of lifting modules of the signature
module type lift = sig type t val lift: t -> t code endto help lift values of option, lift, array types. Users are encouraged to use such lifters in their code if they need to serialize particular CSPs.
User-land development is most extensive: we need more ways to `run' code values, by translating them to C, Fortran, LLVM, Verilog. MetaOCaml can then be used for generating libraries of specialized C, etc. code. User-land MetaOCaml development has the least cost of entry (no need to know OCaml internals) and is wide open for developers. Everyone can contribute, everyone is welcome.
A curious file in the original MetaOCaml CVS repository lists several short notes from the meeting in Paris on September 29, 2000 between Xavier Leroy and Walid Taha. One note stands out: ``we (Xavier!) were able to compile and run "run bracket 1"!!'' One may take this moment as the birth of MetaOCaml.
MetaOCaml started as a fork of OCaml along the lines of MetaML. Designed and architectured by Walid Taha and developed by Cristiano Calcagno, MetaOCaml had reached its current form by 2003. The environment classifiers appeared at the end of that year. Native code compilation and offshoring (translating the generated code into C and Fortran) were added in 2004.
As MetaOCaml was being developed, new versions of the mainline OCaml
were released with sometimes many fixes and improvements. The
MetaOCaml team tracked new OCaml releases and merged the changes into
MetaOCaml. (The MetaOCaml version number has as its base OCaml's
release version.) The merge was not painless. For example, any new
function in the OCaml compiler that dealt with Parsetree
(AST) or
Typedtree
has to be modified to handle MetaOCaml extensions to these
data structures. The merge process became more and more painful as the
two languages diverged. For instance, native code compilation that
first appeared in MetaOCaml 3.07 relied on SCaml, a large set of
patches to OCaml by malc at pulsesoft.com to support dynamic linking.
OCaml 3.08 brought many changes that were incompatible with
SCaml. Therefore, in MetaOCaml 3.08 the native compilation mode was
broken. The mode was brought back in the Summer 2005, by
re-engineering the SCaml patch and implementing the needed parts of
dynamic linking without any modification to the OCaml code. The
revived native compilation has survived through the end.
In 2006, MetaOCaml became the basis for Concoqtion. The increasing divergence between OCaml and MetaOCaml made it harder and harder to merge the changes. With the funds for the project dried up and the daunting prospect of merging many changes that appeared in OCaml 3.10 and 3.11, the development of MetaOCaml has ceased. Its last released version was 3.09.1 alpha 030.
Although the original development team could no longer support MetaOCaml, the users of the language, Jacques Carette in particular, could not bear it fall by the wayside. At the Jacques Carette's urging, the merge of OCaml 3.11 changes was undertaken in February 2010. The lack of funding and limited resources made it imperative to keep close to the mainline OCaml and avoid the divergence at all cost. Everything not directly related to staging was removed, including tag elimination and Concoqtion. 23 fewer files in the OCaml distribution were affected by MetaOCaml. Walid Taha suggested to name the new, bare-minimum line of MetaOCaml as BER MetaOCaml. Its first version N002 was released on March 1, 2010.
BER MetaOCaml established the separation of the kernel from the
user-level. Like with an OS kernel, changes to the MetaOCaml kernel
require `rebooting' (recompiling the MetaOCaml) and may render the
system unusable. User-level changes affect only particular programs
and are easier to develop. The MetaOCaml kernel is responsible for
producing and type-checking code values. User-level deals with
processing, or `running' code values: pretty-printing, executing,
compiling to byte-code, to native code, to C, etc. Programmers may
write new ways of processing code values -- for example, to compile
them to LLVM or JavaScript -- without modifying OCaml or BER
MetaOCaml. In the BER N002 distribution, the kernel included the 59KB
patch to OCaml distribution and the 54KB-long typing/trx.ml
;
user-level consisted of a pretty-printer (59KB source code) and the
support for byte-code execution of the generated code and the custom
top level (3KB, all in all). There was also 20KB of test code.
The 59KB patch to OCaml seems large. Part of it is necessary. Since
BER MetaOCaml extends the AST data type, Parsetree
, with variants
Pexp_bracket
, Pexp_escape
, Pexp_run
, Pexp_cspval
for the
staging constructs, any part of OCaml that processes the AST has to be
modified. Quite many parts of a compiler
deal with AST. In a multi-stage language, each expression is associated
with a stage level, present-stage or one of the progressively future stages.
The original MetaOCaml stored the level in an extra field
val_level
of the value_description
record that maintained type and
other information for each identifier. Since the map from identifiers to
value_description
is the typing environment, it is used all throughout the
type checker. Therefore, the addition of a new field required great
many changes. A new OCaml version modifies the type checker
quite heavily. Integrating all these modifications into MetaOCaml,
accounting for the new field val_level
, is a hard job. It is avoidable
however: we may associate identifiers with levels differently,
by adding a new map to the environment that
tells the level of each future-stage identifier.
Any identifier not in the domain of that map is assumed present-stage.
The value_description
record then
remains intact. BER MetaOCaml N100 (see below) takes this route.
The original MetaOCaml modified the AST further, to store the
information about data constructors used in the generated code
(actually, the AST that corresponded to the generated code contained
the entire present-stage environment of the type-checker.) Great many patches
to OCaml came that way. BER N100 imposed the constructor restriction
as described
elsewhere on this page, eliminating the extra AST modifications and the need
to patch OCaml further.
Thus, if one takes as the goal bringing MetaOCaml closer to OCaml,
the amount of modifications to OCaml can indeed be minimized -- as BER N100
has shown.
OCaml release 3.12 brought unexpectedly many new features to the language,
for example, first-class modules. OCaml 4.0 added GADTs and dynamic linking.
Integrating that many changes into BER MetaOCaml N002 was too depressing to
contemplate. BER MetaOCaml N100, developed in January 2013 and released on
January 31, is a clean-state re-implementation of MetaOCaml. Rather than
merging OCaml 4.0 into the MetaOCaml branch, BER N100 started anew.
It took the OCaml distribution, extended its AST with the staging
constructs, and kept adding code until the constructs were fully handled --
until there were no inexhaustive pattern-matching throughout the OCaml code.
The exhaustiveness check was very helpful. Great care was taken
to modify as fewer OCaml data structures as possible. Here is the break-out
of the BER N100 distribution: the kernel consists of
the 49KB patch to OCaml files (modifying 28 OCaml files,
6 of which trivially) and the 77KB-long completely
re-written typing/trx.ml
with very many comments.
User-level includes the 58KB pretty-printer (heavily modified by
Jacques Carette), and a number of small files to support top-level and run
,
3KB all in all. There are 54KB of tests, including, for the first time,
a regression test suite. BER N100 is much less invasive into OCaml:
compare the size of
the patch to the main OCaml type-checker typing/typecore.ml
, which
contains the bulk of the changes for type checking the staging
constructs. In the previous version BER N004, the patch had 564 lines
of additions, deletions and context; now, only 328 lines.
BER MetaOCaml N101, released in November 2013, cleaned-up MetaOCaml further. The positive experience with the scope extrusion check has made it possible to remove environment classifiers, and hence notably simplify the type checking of staging constructs. The operation to run the code no longer required dedicated type checking rules. In BER N101, it became an ordinary function, moved out of the MetaOCaml kernel. Internally, BER N101 improved the scope extrusion check and optimized the translation of staging constructs, especially binding forms. Pretty-printing of the code has become part of OCaml and hence no longer has to be maintained separately.
The version N102 released on January 1, 2015 was
another significant re-write, although with hardly any user-visible
changes. The BER N102 code makes the extensive use of attributes, the
new feature of OCaml 4.02. Staging annotations -- brackets, escapes
and CSP -- are now really annotations: attributes on the Parsetree and
Typedtree. A specifically Typedtree attribute marks non-expansive
nodes (the non-expansiveness check is performed before the
bracket-translation but is used only after). An attribute on
value_description
tells the staging level of the value. There is no
longer a separate Typedtree traversal pass, after the type checking,
to translate brackets and escapes. That means that for
staging-annotation-free code, MetaOCaml has no substantial overhead.
BER N102 has started on revamping cross-stage-persistence; quite a few
CSP have become printable and, mainly, serializable. Non-serializable
CSP were the only impediment to native MetaOCaml. The number of files
in the OCaml distribution that are patched by MetaOCaml fell down from
32 to mere 7. The patch size is reduced to 34KB. It is now a distinct
possibility that MetaOCaml becomes just a regular library or a
plug-in, rather being a fork.
Version N104 (January 1, 2017) brings back native MetaOCaml and implements two novel features, not seen in MetaOCaml or MetaML before: first-class pattern-matching and let-insertion as a primitive.
Version N107 (October 5, 2018) brings back
offshoring, in a different way, and adds explicit lifting. The code
type is no longer special and pre-defined (which means only 4 OCaml
files are modified by MetaOCaml: lexer, parser, pretty-printer, and
typecore.ml
).
Version N111 (October 5, 2020) adds let rec insertion: the facility to generate mutually recursive definitions whose size is not statically known. It is now possible to control the scope of let-insertion, which is often necessary when the let-bound expression is effectful. The well-scopedness guarantee is still maintained. Offshoring got more polish.
Version N114 (May 11, 2023) re-implements offshoring, adding, among
other things, normalization (removing @@
and |>
,
straightening nested
let-bindings, lifting let-bindings were possible and easy). Mutable
variables and pointer types are supported with no restrictions.
Offshoring is now supported completely, up to printing of C99 C
code. The offshoringIR intermediate language is usable outside
MetaOCaml (for generating C or WASM, etc.)
CSP are re-implemented: non-serializable CSPs are converted to
top-level let-bindings.
We now use Parsetree extension nodes metaocaml.bracket and
metaocaml.escape. The lexer is adaptable now: it lexes .>
exactly as OCaml, until it sees the first .<
(opening bracket).
Developing and especially maintaining a new language is gigantic work. Engineering a new language as a dialect of a well-maintained one is hard but pays in many ways.
ml2013-talk.pdf [200K]
MetaOCaml lives on:
Lessons from implementing a staged dialect of a functional language
The annotated slides of the talk at the ACM SIGPLAN Workshop on ML,
September 22, 2013. Boston, MA, USA
The original MetaOCaml was funded primarily by the NSF project ``ITR/SY(CISE): Putting Multi-Stage Annotations to Work'' led by Walid Taha. Most of the original development and implementation of staging was done by Cristiano Calcagno. Edward Pizzi has implemented the pretty-printing of code, later modified Jacques Carette. Xavier Leroy, INRIA, helped with the compiler specifics.
I am very grateful to Jacques Carette, Walid Taha, Jeremy Yallop, Yukiyoshi Kameyama and the members of his lab at the University of Tsukuba, Chung-chieh Shan, Cedric Cellier and Jun Inoue for many helpful discussions and encouragement. Reports and testing by Tran Minh Quang, Mustafa Elsheik, and Nicolas Ojeda Bar are appreciated.