.mli
or .ml
file, resp.), a functor cannot. This article describes
two problems of separate compilation hindering extensibility and
incremental development: re-linking with another implementation of a
library, and extending a library with new operations. We consider it
unacceptable to edit the existing code in order to use a different/extended
library implementation; when
making an extension we want to keep the old source as it is, and only
refer to rather than copy it. Ideally, any recompilation of the existing
code should be avoided.
To be concrete, suppose we have an interface LA.mli
and its two
implementations, EvalA.ml
and PpA.ml
. There is no LA.ml
.
We want to write the user code ExA.ml
treating LA
as
an ordinary library:
open LA ... code using LA operations ...compile that code to
ExA.cmo
, and link this compiled
ExA.cmo
to either EvalA.cmo
or PpA.cmo
implementations of
LA
. Although straightforward in many languages, e.g., C
, it is
next to impossible in OCaml. (In all fairness, it is so simple in C
because C does not have a module system.) For the next problem,
suppose EvalA.ml
has a public interface EvalA.mli
abstracting its
implementation details. Adding a new operation to both the interface
and the implementation -- not touching or copying the original
files -- is again next to impossible, when implementing the new
operations requires the details abstracted away by the EvalA.mli
interface.
The problematic interactions of the module system with separate compilation came to fore in the course ``Compilers: Incrementally and Extensibly''. Although workarounds were eventually found, described below, they leave a sense of dissatisfaction, even if purely aesthetic. We make two proposal for what might be a satisfactory solution.
I hasten to say that the problems are not of the module system per se but of its interaction with separate compilation. In short, a module (structure) can be ascribed several signatures (with different levels of abstraction); likewise, a single signature can be ascribed to several structures (i.e., different implementations of the signature). Alas, separate compilation imposes a one-to-one correspondence between .ml (module) and .mli (signature) files, which hinders extensibility.
Compilers: Incrementally and Extensibly
The compiler course
The running example deals with DSL embedding: it is the prototypical example of the tagless-final style. The DSL here has only integers and subtraction: just enough to make the point. Formally, its syntax is defined by the following signature:
module type LA = sig type repr val int : int -> repr val sub : repr -> repr -> repr val observe : repr -> unit endTo wit, DSL expressions are represented as OCaml values of the abstract type
repr
, produced by the operations int
and sub
.
The (completed) expressions may also be observed, that its, printed
out. (The observation type could be more interesting; for our
exposition, printing suffices.)
Having defined the DSL, we may already write its expressions, as
follows. They are parameterized by the implementation of the
LA
signature -- the DSL interpreter.
module ExA(L:LA) = struct open L let term = sub (sub (int 4) (int 0)) (sub (int 0) (int (-1))) end
One may imagine many implementations of LA
. The first that probably
comes to mind is a meta-circular evaluator, which maps DSL operations
to OCaml's.
module EvalA = struct type repr = int let int x = x let sub = (-) let observe = Printf.printf "The result: %d\n" endInterpreting the sample expression
ExA
using EvalA
, as
let module M = ExA(EvalA) in EvalA.observe M.termprints the expected result
3
.
It does not to take long to think of other implementations of LA
:
e.g., to pretty-print its expressions.
module PpA = struct open Seq type repr = string t let int x = string_of_int x |> return let paren e = ... let sub x y = append x (cons " - " y) |> paren let observe x = ... endInterpreting the same
ExA
using PpA
, as
let module M = ExA(PpA) in PpA.observe M.termprints
((4 - 0) - (0 - -1))
as the result.
We have thus seen two modules (structures) -- EvalA
and PpA
--
implementing the same signature, LA
. In the Compiler class, the type
checker, code generator, etc. are all interpreters of the signature
that defines the source language.
Let us extend our DSL with another operation: multiplication. First we extend the language definition:
module type LB = sig include LA val mul: repr -> repr -> repr endThe module system lets us literally write what we mean: take an existing collection of definitions and add to it. The older version remains as it was: not modified and not copied. We may now write DSL expressions with multiplication (and reuse earlier expressions as they are):
module ExB(L:LB) = struct open L module EA = ExA(L) let term = mul EA.term (int 2) end
Extending the evaluator is just as simple as extending the language definition: merely adding the interpretation of the new operation:
module EvalB = struct include EvalA let mul = ( * ) endSince
EvalB
is an extension of EvalA
, it can interpret the old example
ExA
, with the same result:
let module M = ExA(EvalB) in EvalB.observe M.termIn other words, we have `linked' the existing user code
ExA
-- as
is, without any modifications -- with an enhanced/improved
implementation of the LA
library. Needless to say, EvalB
also interprets the extended example ExB
:
let module M = ExB(EvalB) in EvalB.observe M.termprinting
6
as the result. We have just witnessed
ascribing the same module, EvalB
, two different
signatures: LA
(so it can be applied to ExA
, which requires as the argument
a module of the LA
signature) and LB
.
Extending PpA
to obtain PpB
is just as straightforward:
module PpB = struct include PpA open Seq let mul x y = append x (cons " * " y) |> paren endThe extended
PpB
can pretty-print the old ExA
(with the same result) and
now ExB
.
We have thus seen what extensibility means, concretely, and how the many-to-many correspondence between modules and signatures plays into it. All the code was in a single file, however -- compiled together rather than separately.
warmup.ml [3K]
The complete code for the example: all in one file.
EvalA
and
PpA
of the signature LA
, and the linking with the user code
ExA
. The user code is also compiled separately, unaware of the
implementations, and should be linkable with either
without touching or even recompiling. Alas, it is not actually
linkable -- not without touching the code base by changing or adding source
files. Inevitably, re-linking requires re-compilation. The section
describes why, how to work around -- and how OCaml could be changed to
obviate the workarounds.
The DSL definition goes into the file LA.mli
:
type repr val int: int -> repr val sub: repr -> repr -> repr val observe: repr -> unitEach implementation also gets its own file. The evaluator is in the file
EvalA.ml
with the content
type repr = int let int x = x let sub = (-) let observe = Printf.printf "The result: %d\n"and the pretty-printer in
PpA.ml
. The file LA.mli
is compiled as
if its content were wrapped into module type LA = sig ... end
. Likewise, EvalA.ml
is compiled assuming the wrapper
module EvalA = struct ... end
around it. This assumption is the
convenient syntax sugar provided by OCaml.
Alas, this syntax sugar does not extend to functors, such as ExA
--
the users of our DSL. A functor may of course be placed in a file,
say, ExAFunc.ml
:
module ExA(L: module type of LA) = struct open L let term = sub (sub (int 4) (int 0)) (sub (int 0) (int (-1))) endAs just explained, it is compiled as if it were wrapped into
module ExAFunc = struct ... end
. That is, the functor is compiled not as
top-level, so to speak, but as a part of another module. The
distinction shows up in linking.
To build the complete program one has to explicitly apply the
functor ExA
to a suitable implementation of LA
, say, EvalA
. We
need a linking file, so to speak: ExAEval.ml
, as follows.
let module M = ExAFunc.ExA(EvalA) in EvalA.observe M.termand a similar file
ExAPp.ml
for applying the pretty-printer.
Assuming all .mli and .ml files are already compiled, the following
command line builds the whole program, for the EvalA
implementation
of LA
.
ocamlc EvalA.cmo ExAFunc.cmo ExAEval.cmoTo use the
PpA
implementation, we build as
ocamlc PpA.cmo ExAFunc.cmo ExAPp.cmo
ExA
in a single
program. For example, concatenating ExAEval.ml
and ExAPp.ml
into
ExABoth.ml
and building with it gives the program that shows the
results of both evaluating and pretty-printing ExA
's expression.
ExAEval
for the
functor application. To use ExA
with the PpA
implementation, we
have to introduce a new linking file ExAPp.ml
containing
the copy of ExAEval.ml
with substituting PpA
for EvalA
-- or
modify ExAEval.ml
in place. Both choices -- cut-and-paste with
substitution and especially destructive modification -- are
unappealing.
ExAFunc.ml
is compiled as a functor: therefore,
calls to LA
operations are indirect. The compiled
ExAFunc.cmo
cannot benefit from link-time optimizations.
ExA
as a top-level
module rather than a functor. For example, as the file ExA.ml
:
open LA let term = sub (sub (int 4) (int 0)) (sub (int 0) (int (-1))) let () = observe termNot only the definitions being top-level is aesthetically pleasing:
LA
operations are now compiled as direct calls.
ExA.ml
actually compiles, even though LA.ml
does not exists: To compile
a library user code we only need the library interface,
LA.cmi
. Regretfully, the straightforward linking of ExA.cmo
with an
LA
implementation such as EvalA.cmo
fails:
ocamlc EvalA.cmo ExA.cmo Error: Module `LA' is unavailable (required by `ExA')If we examine
ExA.cmo
using ocamlobjinfo
, we see
Unit name: ExA Interfaces imported: 79b0e9d3b6f7fed07eb3cc2abb961b91 Stdlib d9378d8b5a64375e0a4765907a7028ed LA bf853957655a3a1eb3caac1964887180 ExA 8f8f634558798ee408df3c50a5539b15 CamlinternalFormatBasics Required globals: LA
ExA.ml
imports LA
and uses its operations; predictably
ExA.cmo
contains the reference to this interface: to its
name and the hash. (The hash is computed when compiling LA.mli
and stored, along with the interface name, in LA.cmi
). An
implementation of LA
would likewise tell the name/hash of the
interface it provides. Name/hash matching is enough to ensure
coherence, that is, a linked implementation indeed providing
the required interface. ExA.cmo
, however, not only refers to
imported interfaces (i.e., interfaces whose implementations are
required) -- but also to a specific implementation of the LA
interface,
also named LA
. Therefore, ExA.cmo
can be linked
only with LA.cmo
-- and not with any other module that may
implement the LA
interface.
Such a rigidity -- insisting on linking with a particular named module rather than any provider of the required interface -- is strange. It is a consequence of an old design decision that the correspondence between separately-compiled modules and the provided interfaces be one-to-one.
Below we show how to work around this design decision and allow different modules to serve as implementations of the same signature: in effect, to link the user code with different library implementations.
LA
, must be in the .cmo
file specifically named LA.cmo
. Therefore, to link
with the EvalA
implementation of the LA
interface, we have no other choice but to produce the file named LA.cmo
.
Hence the work-around:
ocamlc -c -o LA.cmo EvalA.ml # compiling an implementation of LA ocamlc LA.cmo ExA.ml # linkingThese two commands indeed produce an executable with
ExA
using the EvalA
implementation. To
use the PpA
implementation instead, one has to build the executable as
ocamlc -c -o LA.cmo PpA.ml # compiling an implementation of LA ocamlc LA.cmo ExA.ml # linking
The reader has probably noticed ExA.ml
rather than the expected
ExA.cmo
in the linking step. That is, every time we link with a new
implementation of LA
, we have to re-compile the user
code. Re-linking inevitably requires re-compilation. The user code has
to be re-compiled because the interface LA.cmi
it depends on changes
when compiling the implementation:
ocamlc -c -o LA.cmo EvalA.mlGiven the existing
LA.cmi
, one would expect the compiler here checks if
EvalA
satisfies it: that is, if the LA
signature can be ascribed to
EvalA
. The compiler (OCaml 4.14), however, does something quite strange: it
deletes the existing LA.cmi
, without warning, and makes a new one,
based on the module type of EvalA
. Instead of ascribing a signature
to an implementation, the compiler changes the signature to match the
implementation. This is a strange behavior of the current OCaml
system, which we propose to eliminate.
Our work-around here is only partial: re-compilation is still needed for re-linking.
LA.mli
and its (purported)
implementations EvalA.ml
and PpA.ml
, one would think the following
would try to ascribe the interface to an implementation
ocamlc -c LA.mli # producing LA.cmi ocamlc -c -o LA.cmo EvalA.mlIf the compilation succeeded, the resulting
LA.cmo
may then be
linked to any user of LA
interface. As we have just seen, that does
not work.
It is possible nevertheless to ascribe a separately compiled interface to separately compiled and arbitrarily named implementations -- that is, to work-around the one-to-one correspondence between a separately-compiled implementation and its interface. In fact, it is possible in two different ways (although not fully satisfactory).
The first method uses symbolic links. Assume that LA.cmi
already exists
and the user code ExA.cmo
is compiled against it.
ln -s EvalA.ml LA.ml # create the file LA.ml with the same contents as EvalA.ml ocamlc -c LA.ml # check that LA.ml satisfies LA.cmi, and produce LA.cmo ocamlc LA.cmo ExA.cmo # linking
The second method relies on explicit interface files for each
implementation (explained in more detail in the next section).
Again, assume that LA.cmi
already exists
and the user code ExA.cmo
is compiled against it. Also assume the file
LA-incl.ml
with the single line:
include module type of LAThe build is performed as follows:
ln -s LA-incl.mli EvalA.mli # make EvalA.mli, effectively equal to LA.mli ocamlc -c EvalA.mli # make EvalA.cmi ocamlc -c -o LA.cmo EvalA.ml ocamlc LA.cmo ExA.cmo # linkingThe last-but-one compilation command ascribes the signature
EvalA
(which is effectively LA
) to the module EvalA
, and compiles it
under the name LA.cmo
.
The work-around leads to an actionable proposal.
A.ml
under a different name B.cmo
, as in
ocamlc -c -o B.cmo A.mlcheck if there is
B.mli
(if so, check it is compiled, to B.cmi
)
and use this interface to ascribe to B.cmo
. In other words: when
compiling A.ml
under a different name B.cmo
, behave exactly as if
the source A.ml
were named B.ml
.
.cmo
) should refer to
required interfaces rather than to required globals (module names).
In other words, since the one-to-one correspondence between compiled
modules and their interfaces can be worked around, there is no sense
in clinging to it. Separate compilation should not restrict
ascribing a signature to an implementation.
0README.dr [<1K]
The complete source code (in the same directory as the index file)
LA
to LB
) and its
implementations (EvalA
to EvalB
, and similarly for Pp
). The
one-to-one correspondence of a separately compiled implementation to
its signature is the problem here as well -- which can be
worked around. The work-around is used extensively in the Compiler
course -- so extensively that a custom build system
has been written around it. The workaround is
rather simple; something like that could be incorporated in OCaml.
As a preliminary step, let's fix if not a problem but a blemish in the
earlier examples. Modules EvalA
and PpA
are meant to be
implementations of LA
. That intention, however, was not made
explicit to the compiler, and hence cannot be checked at the time of
separately compiling EvalA.ml
and PpA.ml
. If an implementation
does not really match the interface, the error is reported when
linking with the user code. To report such errors earlier, when
compiling the implementation -- and to make to ourselves clear the
interface the module EvalA
is meant to fulfill, we should have
created the EvalA.mli
. Since EvalA.ml
is to be an implementation
of the LA
signature, EvalA.mli
should be a copy of LA.mli
, or
better, a reference to it. That is, EvalA.mli
contains the single
line:
include module type of LAHad
EvalA.ml
omitted, say, the int
operation, it would no longer
compile: the operation int
is required by
EvalA.mli
(that is, LA.mli
).
At first, the extension of the interface and implementation seems
straightforward, just as in the non-separate compilation.
We introduce LB.mli
containing
include module type of LA val mul: repr -> repr -> repr(optionally, file
EvalB.mli
with the copy of it), and the file
EvalB.ml
adding mul
to EvalA
:
include EvalA let mul : repr -> repr -> repr = ( * )Alas,
EvalB.ml
does not compile:
3 | let mul : repr -> repr -> repr = ( * ) ^^^^^ Error: This expression has type int -> int -> int but an expression was expected of type repr -> repr -> repr Type int is not compatible with type repr = EvalA.reprThe signature
EvalA
(which is equal to LA
) ascribed to the
included EvalA
made the type repr
abstract. To implement mul
on the type repr
,
however, we need to know the concrete type of repr
, and be sure it
is int
.
To use EvalA
, it behooves us to ascribe it a signature that hides
the implementation. But to extend EvalA
, we need a signature that
exposes full detail. We really need to ascribe different
signatures to the same module. Although the OCaml module system has
this ability, the separate compilation does not. For each .ml
file
there may be only one .mli
file (specified by the user, or made
implicitly by the compiler), with the signature to ascribe to the
corresponding .ml
module. Once ascribed, the signature cannot be removed,
and a more transparent signature cannot be ascribed.
The work-around is straight forward, as before: if EvalA.ml
may take
only one ascribed signature, EvalA.mli
, to ascribe another
we have no choice but give the file EvalA.ml
another name, say,
EvalA_impl.ml
. Many file
system allow aliasing. The extended implementation, EvalB.ml
should
then include EvalA_impl
. Overall:
ln -s EvalA.ml EvalA_impl.ml # Alias EvalA.ml as EvalA_impl.ml ocamlc -c EvalA_impl.ml # make EvalA_impl.cmo ocamlc -c EvalB.ml # Compile the extension to EvalA_impl.cmoThe last-but-one command compiles
EvalA_impl.ml
ascribing the
(default) signature, fully exposing the implementation details.
Compiling EvalB.ml
in the last command then succeeds since the concrete
type of repr
is exposed as int
.
The work-around is ungainly, relying on symbolic links that have to be
made and cleaned-up. The earlier concrete proposal, if implemented,
would make it unnecessary. Recall, the proposal was:
When compiling A.ml
under a different name B.cmo
, as in
ocamlc -c -o B.cmo A.mlbehave exactly as if the source
A.ml
were named B.ml
0README.dr [<1K]
The complete source code (in the same directory as the index file)
The build system for the Compiler course, designed to support the incremental, step-wise development.
We have emphasized a non-destructive extension/evolution of libraries: to link with a different library or to extend an existing one, no old code should be modified, or copied/cut-and-pasted -- or even re-compiled. The old code base is always available as a fall-back.
Keeping old versions as is, untouched, both in source and compiled form, may remind of version control. Our approach `version-controls' not just source but also the compiled artifacts. Mainly, instead of looking at diffs in a repo, through a repo-specific interface, in our development approach the diffs are the source code. We write an extension as a diff of sort, by referring to the old code and adding new definitions. Our `diffs' are hence intentional and semantically meaningful -- and can be viewed as the source OCaml code with all convenience: syntax highlighting, jump to the definition, type tooltips, Merlin, etc. The Compiler class has demonstrated that such incremental development scales.