The characteristic of the course is an iterative, incremental development: we start with the most trivial source language, develop the full compiler for it, and then keep extending the source language and the compiler in small steps, reusing the earlier work as much as possible. At each iteration, we build the complete, end-to-end compiler producing runnable and testable executables, for a (progressively larger) subset of the source language.
Another characteristic is the extensive use of tagless-final style, taking the full advantage of extensibility afforded by it. Extensibility here means reuse -- of type-checked and compiled artifacts from the previous increment -- rather than copy-paste. The compiler is hence structured as a stack of domain-specific languages, with parsing at the bottom and assembly at the top. The languages are extended by adding new operations here and there (and only occasionally by redirection).
We cover all standard material for the compiler course, from parsing and type-checking to analyses, optimizations, calling conventions and assembly generation -- but in a quite non-traditional fashion.
This course was first taught as an elective 15-lecture upper-level undergraduate course in the Fall of 2022.
Code for the section: Introduction to tagless-final style
Free Variable as Effect, in Practice
A greatly simplified version of the course, which still, hopefully, gives the the taste of the incremental, step-wise compiler development. The article also explains extensible effects in compilation -- the technique currently used in the course.
make. It is designed to support the
incremental, step-wise development and has only what is needed. It is
simple, fast, and portable across MacOS, Linux and WSL.
The build tool is actually an OCaml script called
make, the script accepts the list of
targets to make, and makes them in the given order, stopping in case
of error. The script is a regular OCaml file, and can be
edited as such: see the sample below. One should particularly note
compiler_manifest, which lists all the source files needed to build the
compiler, and how to build them.
In 2022 the course used
make. The experience showed why a custom
build tool would be more more pleasant and convenient to use. First,
since our compiler is built by extending the earlier code, the source
code is spread through many steps (each in its own directory). The
code on step23 may use the source code files in
step20/. The VPATH facility of
make supports exactly this
make look up the needed files. It is not clear,
however, what exactly
make finds and where exactly the code is. It is
much preferable to have the complete list of code files.
Some files from the previous steps have to be renamed before
compiling. For example,
step21/lang.mli should be renamed to
lang_21.mli: it would be included under that name in
step22/lang.mli. Previously I resorted to symbolic links, which
clutter the directory and are harder to maintain.
Since files are so spread out, automatic dependencies are difficult to
maintain. Ocamldep does not help here. On the other hand, to build the
final executable, we have to explicitly enumerate all needed
files in the correct order. The order has to be manually
maintained in any case. One may observe that this order is exactly the
order in which to compile the source code files. Therefore, we do not need
any tool to discover dependencies, since we have to list
files anyway. (We need to also add
.cmi files to the
build manifest). We hence assume consecutive
(single-threaded) build and lose on parallelism. In our
experience, the compilation is very fast (including the complete
rebuild), so parallelism is not necessary.
Section 4.1.1 of the Course notes explains how to use the build tool
The implementation of the rules (including time-stamp checking, etc.) and the main function
util contains commonly used
code, in particular, the build tool and the testing framework.
The Build tool
Testing framework: confirm the parsing, the generated code, and the result of executing the compiled code