Implementing Explicit and Finding Implicit Sharing in Embedded DSLs
The distilled tutorial on sharing and common subexpression elimination
in embedded domain-specific language compilers was presented at DSL
2011: IFIP Working Conference on Domain-Specific Languages, which took
place on 6-8 September 2011 in Bordeaux, France. This page collects
the tutorial material in the form of the paper published in the DSL
2011 Proceedings, the extensively commented Haskell code, and
solutions to selected exercises.
- We present the detection of implicit sharing and the
implementation of explicit sharing in the context of
embedded domains-specific language (DSL) compilers. The sharing
implementation techniques have since found other uses, e.g., writing SAT
solvers. Embedded compilers -- typical for circuit description,
embedded control systems or GPU programming DSLs -- complement the
familiar embeddings of a DSL in a host language as a library or an
interpreter. For example, we may build a circuit model in
Haskell using gate descriptions and combinators provided by the DSL
library. We may test the circuit in Haskell by running the model
on sample inputs. Eventually we have to produce Verilog
code or a Netlist to manufacture the circuit. Likewise, a control
system DSL program should eventually be compiled into machine code and
burned in as firmware. An embedded compiler thus is a host language
program that turns a DSL program into a (lower-level) code.
One of the important tasks of a compiler is the so-called ``common
subexpression elimination'' -- detecting subexpressions denoting the
same computation and arranging for that computation to be performed
once, sharing the result. This optimization (significantly)
improves both the running time and compactness of the code and is
particularly important for hardware description and firmware
compilers. As DSL implementers, it becomes our duty to detect
duplicate subexpressions and have them shared. We call this detection
implicit sharing, to contrast with the sharing explicitly declared by
users. Imperative languages, where it matters whether it is an expression
or its result are duplicated, have to provide a form (some sort of a
local variable introduction) for the programmer to declare the sharing
of expression's result. In this tutorial, we limit ourselves to
side-effect--free expressions such as arithmetic expressions or
combinational circuits. Explicit sharing is still important: sharing
declarations can greatly reduce the search space for common
subexpressions and prevent exponential explosions.
Sharing declarations also help human
readers of the program, drawing their attention to the
A common pitfall in implementing explicit sharing is attempting to use
let form of the host language to express sharing in the DSL
code. Although the host
let-form may speed up some DSL
interpretations, it leads to code duplication rather than sharing in
the compilation result.
Our technique is a pure veneer over hash-consing, hiding its seemingly
imperative nature beneath a simple combinator language. The overall
implementation remains pure functional and easy to reason about. Our
technique makes adding explicit sharing particularly easy.
- sharing.pdf [291K]
The paper published in the proceedings of DSL 2011
(Olivier Danvy and Chung-chieh Shan, Eds.)
EPTCS.66, 2011, pp. 210-225.
- The tutorial is the presentation of the library for detecting implicit
and implementing explicit sharing. For clarity, the tutorial is
centered around a simple (sub)DSL of arithmetic expressions embedded
in Haskell. The running examples, albeit simplistic, are abstracted
from real-life, reported problems.
- ExpI.hs [7K]
- This project was prompted by a mailing list message reporting an
unsuccessful attempt to detect and eliminate common
subexpressions when implementing a real DSL compiler.
Show-stopping was the expression
comparison, which, in pure language is structural and requires the
full traversal of expressions. We distill that attempt, presenting
our DSL of arithmetic expressions and its embedding in Haskell as a
data type. We introduce the two simple running examples:
multiplication by a known integer constant and
the optimal running sum computation (sklansky).
- Ptrs.hs [6K]
- Reviewing the main approaches to speeding-up the comparison of DSL
expressions, all relying on some sort of `pointer' equality, or, on
labeling expressions with unique pieces of data (e.g., integers) that
admit efficient comparison. The first approach is manual
labeling. The second uses the state monad to automate label
assignment. The third, popular but treacherous approach is the so-called
observable sharing. The code illustrates its dangers. None of these
approaches help with common sub-expressions unrelated to pointer
- ExpF.hs [12K]
- The first part of our solution: introducing the tagless-final
embedding of the DSL and presenting two interpreters, to evaluate
expressions and to print them. We then add another interpreter,
interpreting a DSL expression as a
Node in a
DAG, a hash-consed
data structure. The interpretation detects common subexpressions and
shares them. Rather then converting an abstract syntax tree to a DAG,
we build the DAG from the beginning.
- ExpLet.hs [7K]
- Explicit sharing, the
let-form, is needed already in a
side-effect--free language. The
ExpLet.hs code explains why, adding
explicit sharing to our DSL. The code demonstrates how explicit
sharing greatly helps the implicit one.
- BiMap.hs [2K]
- The utility code: Establishing a bijection between the values of the
a and integers, with the operations to retrieve the value given
its key, to find the key for the existing value, and to extend the
bijection with a new association. The type
a of values should at
least permit equality comparison; In the present implementation, we
a to be a member of Ord. There are many ways to implement
bi-maps, for example, using hash tables, or maps. The present
implementation relies on
Data.IntMap to record both
parts of the association.
- DSLSharing.hs [17K]
- The old code illustrating implicit and explicit sharing in embedded
DSLs. The code uses a slightly richer object language. It was posted
on the Haskell-Cafe list in February 2008 in response to Tom Hawkins'
- ExpLetList.hs [5K]
This is the solution to exercise 13 in the DSL paper, which was the
challenge posed by Matthew Naylor back in 2008. The solution
demonstrates explicit sharing in the sklansky example, without adding
lists to our DSL. Mainly, we explicitly avoid rewriting
the monadic style. We do have to change the example to insert explicit
sharing. But we do not have to do the wholesale re-writing, retaining
the list comprehension in
- Despite the Lisp-sounding name, hash-consing was invented before
Lisp. It was described, for the English audience, in the first volume
of `Communications of the ACM' in 1958, as the translation of the
paper by A.P. Ershov that appeared in the Proceedings of the Academy
Of Sciences of the USSR earlier that year. The translation is quite
accurate, as far as I could see, but drops the flowcharts and the
figure depicting hash function distributions. The terminology may be
confusing: what is called `conditional number' in the translated
article is now called `hash-code'. The standard terminology has not
been invented yet.
Ershov's short paper fully describes what we now call hash tables,
hash functions, the useful properties of hash functions, and
hash-consing. The paper analyzes the time complexity of the
algorithm. Since the algorithm has two exits, writing programs in the
continuation-passing style must have been familiar back then.
The library presented at DSL 2011 unwittingly follows Ershov's
algorithm closely. The library is (hopefully) better described: see
the preface to the English translation of Ershov's paper. Nowadays,
starting a paper with the phrase ``All unexplained terms are those
from '' (where  is the single reference) would not be taken
kindly by reviewers.
- A. P. Ershov: On programming of arithmetic operations
Communications of the ACM, 1958, v1, N8, pp. 3-6.
А.П.Ершов. О Программировании
Доклады Академии наук СССР, 1958, т.118, N3,
(Математика) Представлено академиком
С.Л.Соболевым 2 VII 1957.