The other goal of the course is to give a taste of modern software development, specifically: test-driven development, version control, and the stress on reading, comprehending and extending code rather than writing from scratch.
The characteristic of the course is an iterative, incremental development: we start with the most trivial source language, develop the full compiler for it, and then keep extending the source language and the compiler in small steps, reusing the earlier work as much as possible. At each iteration, we build a complete end-to-end compiler producing runnable and testable executables, for a (progressively larger) subset of the source language.
Another characteristic is the extensive use of tagless-final style, taking the full advantage of extensibility afforded by it. The extensibility here means reuse -- of type-checked and compiled artifacts from the previous increment -- rather than copy-paste. The compiler is hence structured as a stack of domain-specific languages, with parsing at the bottom and assembly at the top. The languages are extended by adding new operations here and there (and only occasionally by redirection).
Yet another feature is the attention given to names, or `variables', and associating attributes to them. Our approach, which readily permits adding attributes at will and analyzing variable usage, may remind some of algebraic effects.
We cover all standard material for the compiler course, from parsing and type-checking to analyses, optimizations, calling conventions and assembly generation -- but in a quite non-traditional fashion.
This course was taught as an elective 15-lecture upper-level undergraduate course in the Fall/Winter of 2022 and 2023.
<tfintro/0README.dr>
Code for the section: Introduction to tagless-final style
Free Variable as Effect, in Practice
A greatly simplified version of the course, which still, hopefully, gives
the taste of the incremental,
step-wise compiler development. The article also explains
extensible effects in compilation -- the technique currently used in the course.
emulator.ml [8K]
An elementary x86-64 emulator, to explain assembly in the
introduction. The knowledge of assembly is not a prerequisite.
ocamllex
/ocamlyacc
make
. It is designed to support the
incremental, step-wise development and has only what is needed. It is
simple, fast, and portable across MacOS, Linux and WSL.
The build tool is actually an OCaml script called build.ml
. Like
make
, the script accepts the list of
targets to make, and makes them in the given order, stopping in case
of error. The script is a regular OCaml file, and can be
edited as such: see the sample below. One should particularly note
compiler_manifest
, which lists all the source files needed to build the
compiler, and how to build them.
In 2022 the course used make
. The experience showed why a custom
build tool would be more more pleasant and convenient to use. First,
since our compiler is built by extending the earlier code, the source
code is spread through many steps (each in its own directory). The
code on step23 may use the source code files in step22/
, step21/
,
step20/
. The VPATH facility of make
supports exactly this
pattern, making make
look up the needed files. It is not clear,
however, what exactly make
finds and where exactly the code is. It is
much preferable to have the complete list of code files.
Some files from the previous steps have to be renamed before
compiling. For example, step21/lang.mli
should be renamed to
lang_21.mli
: it would be included under that name in
step22/lang.mli
. Previously I resorted to symbolic links, which
clutter the directory and are harder to maintain.
Since files are so spread out, automatic dependencies are difficult to
maintain. Ocamldep does not help here. On the other hand, to build the
final executable, we have to explicitly enumerate all needed .cmo
files in the correct order. The order has to be manually
maintained in any case. One may observe that this order is exactly the
order in which to compile the source code files. Therefore, we do not need
any tool to discover dependencies, since we have to list .cmo
files anyway. (We need to also add .cmi
files to the
build manifest). We hence assume consecutive
(single-threaded) build and lose on parallelism. In our
experience, the compilation is very fast (including the complete
rebuild), so parallelism is not necessary.
course-notes.pdf [1123K]
Section 4.1.1 of the Course notes explains how to use the build tool
build.ml [9K]
The implementation of the rules (including time-stamp checking, etc.)
and the main function
util
contains commonly used
code, in particular, the build tool and the testing framework.build.ml [9K]
The Build tool
expect.ml [5K]
Testing framework: confirm
the parsing, the generated code, and the result of executing the
compiled code
Makefile [<1K]
Makefile.common [<1K]