One of the important tasks of a compiler is the so-called ``common subexpression elimination'' -- detecting subexpressions denoting the same computation and arranging for that computation to be performed once, sharing the result. This optimization (significantly) improves both the running time and compactness of the code and is particularly important for hardware description and firmware compilers. As DSL implementers, it becomes our duty to detect duplicate subexpressions and have them shared. We call this detection implicit sharing, to contrast with the sharing explicitly declared by users. Imperative languages, where it matters whether it is an expression or its result are duplicated, have to provide a form (some sort of a local variable introduction) for the programmer to declare the sharing of expression's result. In this tutorial, we limit ourselves to side-effect--free expressions such as arithmetic expressions or combinational circuits. Explicit sharing is still important: sharing declarations can greatly reduce the search space for common subexpressions and prevent exponential explosions. Sharing declarations also help human readers of the program, drawing their attention to the `common' computations.
A common pitfall in implementing explicit sharing is attempting to use
the let
form of the host language to express sharing in the DSL
code. Although the host let
-form may speed up some DSL
interpretations, it leads to code duplication rather than sharing in
the compilation result.
Our technique is a pure veneer over hash-consing, hiding its seemingly imperative nature beneath a simple combinator language. The overall implementation remains pure functional and easy to reason about. Our technique makes adding explicit sharing particularly easy.
Node
in a DAG
, a hash-consed
data structure. The interpretation detects common subexpressions and
shares them. Rather then converting an abstract syntax tree to a DAG,
we build the DAG from the beginning.let
-form, is needed already in a
side-effect--free language. The ExpLet.hs
code explains why, adding
explicit sharing to our DSL. The code demonstrates how explicit
sharing greatly helps the implicit one.a
and integers, with the operations to retrieve the value given
its key, to find the key for the existing value, and to extend the
bijection with a new association. The type a
of values should at
least permit equality comparison; In the present implementation, we
require a
to be a member of Ord. There are many ways to implement
bi-maps, for example, using hash tables, or maps. The present
implementation relies on Data.Map
and Data.IntMap
to record both
parts of the association.
sklansky
in
the monadic style. We do have to change the example to insert explicit
sharing. But we do not have to do the wholesale re-writing, retaining
the list comprehension in sklansky
.
Ershov's short paper fully describes what we now call hash tables, hash functions, the useful properties of hash functions, and hash-consing. The paper analyzes the time complexity of the algorithm. Since the algorithm has two exits, writing programs in the continuation-passing style must have been familiar back then.
The library presented at DSL 2011 unwittingly follows Ershov's algorithm closely. The library is (hopefully) better described: see the preface to the English translation of Ershov's paper. Nowadays, starting a paper with the phrase ``All unexplained terms are those from [1]'' (where [1] is the single reference) would not be taken kindly by reviewers.
А.П.Ершов. О Программировании
Арифметических Операторов
Доклады Академии наук СССР, 1958, т.118, N3,
427-430.
(Математика) Представлено академиком
С.Л.Соболевым 2 VII 1957.