The common way of raising expressiveness is complicating types. System F, the polymorphic calculus, can represent integer arithmetic -- and via the Boehm-Berarducci (or Goedel) encoding, inductive data structures such as lists and trees. System F is strongly normalizing, maintaining connections with logic -- but its semantics is quite more complicated. Also, Boehm-Berarducci (let alone Goedel) encodings are inherently inefficient. One may also complicate the type system by adding recursive types, which lets us embed untyped lambda calculus and represent anything, and efficiently -- at the cost of even more complicated semantics.
A different direction is to maintain simple types but add constants. Just adding booleans (or integer literals with zero-branching but no other operations on them) lets us define pairs and sums -- the known, but not widely, result. What else can we express if we add successor or addition, etc?
If we add the basic integer data type and the constants for successor, predecessor, branching-on-zero, fixpoint, we get the calculus called PCF. Although its functions are generally partial, it does have a rather straightforward semantics: the famous Scott's LCF, among others. PCF is an extension of the famous Goedel System T, which admitted only primitive recursion on integers. PCF is computationally complete (e.g., it is a model of Kleene's partial recursive functions) and hence can, in principle, represent anything computable. One has to distinguish however an encoding of a program via a global transformation (continuations-passing transformation or an encoding into a Turing machine or partial recursive function) from a local encoding of operations as macros/procedures, in terms of PCF primitives. The present page concerns the latter: macro-expressiveness. It is not to be taken for granted in PCF and, at times, is unexpected. Our encodings are type sound and do not introduce partiality. In fact, many of them hold already in System T or even weaker systems. They encodings also reasonably efficient, with no exponential blow-up.
PCF is a real programming language: it is a subset of a typed functional programming language such as OCaml. All encodings described on the present page have been implemented in OCaml and tested: see the accompanying complete code. We also use the OCaml notation to present PCF code on this page.
The trigger for developing the PCF encodings were much appreciated discussions with Naoki Kobayashi in September 2011, in the context of program verification via Higher-Order Recursion Schemes (HORS). Then used version of HORS could not cope with polymorphic and recursive types; hence simply-typed encodings were sought. Many of the presented encodings were developed at that time. I also thank Sam Lindley and John Longley for literature references.
Beyond Church encoding: Boehm-Berarducci isomorphism of algebraic data types and polymorphic lambda-terms
This page is about simple types, which are the types of the form
t ::= int | t -> tWe shall use meta-variables
t, and sometimes
s, to refer to simple types.
We will also be dealing with simply-typed
lambda-calculi. All of them have variables, lambda-abstractions and
applications -- and the constants of the
int type: the numerals
The calculi differ in what else they have. The calculus to be called
L0 has nothing else. The calculus
LB has the family of constants
if0: int -> t -> t -> t for all types
t, with the following
if0 0 e1 e2 = e1 if0 n e1 e2 = e2, for any n other than 0for all terms
e2of the same type. We will write the application
if0 n e1 e2in a more familiar way:
if n=0 then e1 else e2. The calculus
LB, let alone
L0, supports no other operation on integers: not even the successor. Next is System T, which has successor operation on integers and the family of constants
R: t -> (t->int->t) -> int -> t(Goedel recursor) with the conversion rules
R e1 e2 0 = e1 R e1 e2 n = e2 (R e1 e2 m) m, where n is a successor of mNow we can do integer arithmetic: addition, multiplication, etc. It is easy to see that the term
fun n x y -> R x (fun x' y' -> y) n(where
fun x -> edenotes a lambda-abstraction) `behaves just like
if0': it has the same type as
if0and its application to three arguments converts (using beta- and R-conversion) like
if0. We say:
if0is macro-expressible (some say, `definable') in System T (and hence System T subsumes
LB). The topic of this page is to describe other, more interesting, macro-expressibilities. Since we are only after macro-expressibility, we will often drop `macro'. Finally, there is PCF: it has successor and predecessor as constants, as well as the family of constants
fix: (t->t) -> tfor all types
t, with the conversion rule
fix e = e (fix e)PCF supports (i.e., expresses) all arithmetic on integers; it also expresses
R. It is also the first calculus with partiality: in the other calculi, any term is convertible to an numeral, when applied to the right number of arguments, of the right type but otherwise arbitrary. Not so in PCF.
In the rest of this section we will be dealing with the calculus
which is included in all the others. Hence the basic facts described
here are the most general.
The first, very general and useful fact, is that every simple type is
inhabitable: for any
t there is a closed term of that type. Indeed, for
int we have the term
0. Any other type can be written in
t1 -> ... -> int and is hence inhabitable by a constant
fun x ... -> 0.
t_in for an arbitrary inhabitant of the type
The next useful notion is encoding, or injection/projection pair. Two
inj: t -> u and
prj: u -> t called injection/projection pair
if for all terms
e of type
prj (inj e) can be converted to
e. We shall often say: the composition
prj . inj is the
identity. In other words,
t is encodable as
providing the encoding and
prj decoding. The requirement
prj . inj
being the identity then means that what is encoded can be
decoded. Mathematically speaking,
t is called a retract of
the corresponding pair of terms
prj called the retraction,
t < u. Incidentally, if
inj . prj is also the identity,
prj are called bijection, and the types
isomorphic. For a general retraction, however,
inj is not onto:
there are some terms of type
u that are not convertible to any
inj e, whatever
e may be be (that is, not everything in
u is an
encoding of some
A useful fact about retractions (to be called (F) and used soon)
is that if
t < u and
t' < u' then
(t->t') < (u->u'). Look closely: it is as simple as that,
there is no contra-variance. Therefore,
(t'->t) < (u'->u) also holds. The proof is simple: if
inj_t, prj_t is the retraction
t < u and
inj_t', prj_t' is the retraction
t' < u', then the retraction
(t->t') < (u->u') is defined as
let inj : (t->t') -> (u->u') = fun f -> inj_t' . f . prj_t let prj : (u->u') -> (t->t') = fun f -> prj_t' . f . inj_tIt is easy to verify that
prj . injis indeed the identity.
Another useful and interesting fact (to be called (I)) about simple
types is that
int is encodable in any type:
int < t. The proof is
simple induction. The base case
int < int is obvious. In the
inductive case we have to show
int < (t->u) where
int < u, with
prj_u. Clearly, an
int can always be
turned into a function: the constant function. Thus we have
let inj : int -> (t -> u) = fun x -> fun _ -> inj_u x let prj : (t -> u) -> int = fun f -> prj_u (f t_in)Recall,
t_inis an inhabitant of
t. Again, the verification
prj . injbeing the identity is easy.
We come to the main result of this section: union. For any two simple
u there exists a simple type, to be denoted as
t <+> u, that encodes both:
t < (t <+> u) and
u < (t <+> u). That is, for any two types there is a type that is
`big enough' for both.
The proof is inductive and straightforward. It goes without saying
that if two types are the same, they are the union:
t <+> t = t.
The retractions are the identities -- quite in the spirit of C.
t < u then
t <+> u = u.
int is encodable in any type (fact (I)), the only case to consider is
(t1 -> t2) <+> (u1 -> u2),
assuming by induction that there exist
t1 <+> u1 and
t2 <+> u2.
The union is then
(t1 -> t2) <+> (u1 -> u2) = (t1 <+> u1) -> (t2 <+> u2)The corresponding retractions
(t1 -> t2) < ((t1 <+> u1) -> (t2 <+> u2))and
(u1 -> u2) < ((t1 <+> u1) -> (t2 <+> u2))follow from the basic fact (F) about retractions.
In OCaml, the retraction interface can be represented as the signature
module type retraction = sig type t type u (* the following must satisfy prj . inj = id *) val inj : t -> u val prj : u -> t endand the union of
module type union = sig type t1 type t2 type union module R1 : (retraction with type t = t1 and type u = union) module R2 : (retraction with type t = t2 and type u = union) end
The code contains the implementation, following the inductive rules above. The next section shows an example.
John Longley and Dag Normann: Higher-Order Computability
In `Theory and Applications of Computability' series, Springer (2015)
The basic fact (F) is Prop. 4.2.1i(3). The basic fact (I) and the existence of the union can be gleaned from Prop. 4.2.3 of the book. That proposition actually deals with more general lambda-algebras and pure types, and a bit more difficult to understand. Its construction is also more complex than needed for our purposes.
John Longley: The encodability hierarchy for PCF types
April 30, 2019 <https://arxiv.org/pdf/1806.00344.pdf>
The first section gives a very good introduction to the area and its terminology.
if0. Thus the encoding works in the calculus
LBand those containing it.
Before getting to the encoding, we have to discuss what the result must be. What is a pair? One answer is the facility with the following interface:
type ('a,'b) pair (* abstract *) val pair : 'a -> 'b -> ('a,'b) pair val fst : ('a,'b) pair -> 'a val snd : ('a,'b) pair -> 'bHowever, our calculi have no polymorphism and type variables and no way to express even the types in that interface. Still, the above definition of pairs is not wrong, if we understand
'bas schematic variables and the interface as a schema (cf. axiom schemas in first-order logic). Upon such reading, the interface does not prescribe the single function
pairthat takes two arguments of arbitrary types. What is prescribed is that for any two specific types
t2, there must be a type
tpairand (total) functions
pair_t1t2 : t1 -> t2 -> tpair,
fst_t1t2 : tpair -> t1,
snd : tpair -> t2satisfying the obvious properties
fst_t1t2 (pair_t1t2 x y) = x snd_t1t2 (pair_t1t2 x y) = yfor any
It may take a moment to realize that since the type of
pair_t1t2 : t1 -> t2 -> tpair is specific to the particular types
t2, so may be the term
pair_t1t2 itself. In OCaml,
Haskell, etc., pairs are polymorphic: there exists one
single term (pairing function) that builds pairs of all possible
Such polymorphic pairs is just one implementation choice. The
other choice is type-specific pairing: the representation and
construction of a pair of
t2 is not uniform but specific to
t2. (The most well-known type-specific operation
is equality: each type has its own way of comparing its values.)
Thus the interface of simply-type pairs is expressed by the following OCaml signature:
module type pairs = sig type t1 type t2 type tpair val pair : t1 -> t2 -> tpair val fst : tpair -> t1 val snd : tpair -> t2 end(Not stated explicitly but important are the requirements of being total and satisfying the fst-pair and snd-pair equational laws above.) We demonstrate that an implementation of this interface exists for any simple types
The implementation is trivial. Recall the encoding of pairs in untyped lambda calculus:
let pair x y = fun t -> if t = 0 then x else yIt won't work exactly like that in the typed setting because
ymay have different types. That is where the union comes in:
let pair : t1 -> t2 -> (int -> (t1 <+> t2)) = fun x y -> fun t -> if t = 0 then inj_t1 x else inj_t2 y let fst : (int -> (t1 <+> t2)) -> t1 = fun p -> prj_t1 (p 0) let snd : (int -> (t1 <+> t2)) -> t2 = fun p -> prj_t2 (p 1)Since the union
t1 <+> t2exists for any simple types
t2, so is the pairing. Thus defined
sndare clearly total and satisfy the expected equational laws. In the following, we write the type
int -> (t1 <+> t2)as
t1 * t2.
t < u, then their product is the simple type
int -> u.
In OCaml, the construction can be realized as a functor
module Pair(U:union) = struct type t1 = U.t1 type t2 = U.t2 type tpair = int -> U.union let pair : t1 -> t2 -> tpair = fun x y -> fun t -> if t = 0 then U.R1.inj x else U.R2.inj y let fst : tpair -> t1 = fun p -> p 0 |> U.R1.prj let snd : tpair -> t2 = fun p -> p 1 |> U.R2.prj end
Here is an example. First we build the
union implementations for two
module R_same_int = RSame(struct type t = int end) module U_same_int = URetr(R_same_int) (* URetr builds a union implementation from a retraction *)and the types
int->int(from the retraction
int < (int->int))
module Ui_ii = URetr(Rint(R_same_int) (struct type ta = int let inhabitant = 0 end))The pair of
module Pairi_ii = Pair(Ui_ii)that can be used as expected:
Pairi_ii.(pair 0 succ |> fst)gives
Pairi_ii.(pair 0 succ |> snd)is the successor, which can be verified by checking that
Pairi_ii.(pair 0 succ |> snd) 10evaluates to
11. The accompanying code has more examples.
John Longley and Dag Normann: Higher-Order Computability. In
`Theory and Applications of Computability' series, Springer (2015)
Product types are discussed in Sec 4.2.2, but in a far more general setting and different aims. Our pair construction is a particular case of the categorical construction in Prop. 4.2.5.
The interface of sums is expressed by the following OCaml signature
module type sums = sig type t1 type t2 type tsum val inl : t1 -> tsum val inr : t2 -> tsum val is_left : tsum -> bool val is_right : tsum -> bool val left : tsum -> t1 val right : tsum -> t2 end(In
LB, a boolean is represented as an integer: 0 is
falseand non-zero is
true-- like in C.) These functions must satisfy the equational laws
is_left (inl x) === true left (inl x) === x is_right (inr x) === true right (inr x) === x is_left === not . is_rightStrictly speaking,
rightoperations do not have to be total. They are in our implementation:
left (inr x)does not crash and does return some value. The implementation is straightforward: pairing the union with the discriminator, 0 for left and 1 for the right alternative.
module Sum(U:union)(P:pairs with type t1=int and type t2=U.union) = struct type t1 = U.t1 type t2 = U.t2 type tsum = P.tpair let inl : t1 -> tsum = fun x -> U.R1.inj x |> P.pair 0 let inr : t2 -> tsum = fun x -> U.R2.inj x |> P.pair 1 let is_left : tsum -> bool = fun x -> P.fst x = 0 let is_right : tsum -> bool = fun x -> P.fst x = 1 let left : tsum -> t1 = fun x -> P.snd x |> U.R1.prj let right : tsum -> t2 = fun x -> P.snd x |> U.R2.prj endIt is just as clear that the required properties of these operations are satisfied. In the following, we write the sum type of
t1 + t2.
A different encoding of sums is suggested (as an exercise) in the
Normann's book. Since all simple types are inhabitable, sums can be
represented as products: a sum value
t1 + t2 pairs the
with an arbitrary inhabitant of
t2. We also need a discriminator,
1, to tell which of the two components of the pair contains the `valid'
value. All in all, Longley and Normann represent
int * t1 * t2. This encoding is more complicated than ours, but it
has a nice special case: option types. They can be
realized hence as
type toption = int * t let none : toption = pair 0 t_in let some : t -> toption = pair 1
John Longley and Dag Normann: Higher-Order Computability. In
`Theory and Applications of Computability' series, Springer (2015)
LBby iterating the pair construction (as was noted already by Cantor). Lists are more complex. The size of an n-tuple is known in advance; therefore, we may define tuple constructors and accessors for this specific size. In contrast, each list value is a sequence of its own size. List operations should be able to handle any lists whatever their size. Correspondingly, whereas n-tuples may have elements of various types, lists, in the simply typed setting, have to have elements of the same type.
What are lists, exactly? First comes to mind is the familiar (from
Lisp) interface of
operations. However, when we try to implement, say, list reversal or
concatenation, we realize something else is needed: the ability to
cons statically unknown
number of times. Thus is, we need recursion, and hence calculus as
rich as PCF.
On further thought, we notice that lists are the generalization of
natural numbers: think of unary numerals. The list interface then
comes along the lines of System T: for all simple types
t there exists a
tlist with the following constants
nil : tlist cons : t -> tlist -> tlist iter : u -> (u -> t -> tlist -> u) -> t -> uwith
nilplaying the role of
consthe role of successor, and
iterthe role of Goedel recursor R (the type
uis an arbitrary simple type). It is easy to see that
zip, and all other familiar operations on lists are implementable using this interface. (See also the accompanying code).
Which calculi can implement this list interface is an interesting
question, investigated in detail in Cegielski and Richard's paper. One
important answer to it was given already by Goedel: any calculus that
has list decoding/encoding and natural number comparison can implement
any primitive recursive function -- and hence full arithmetic. Its
theory is therefore undecidable. That a theory of lists includes
arithmetic should not be too surprising: after all, lists are a
generalization of natural numbers. Therefore, we need at least System
T to implement lists. In fact, System T actually suffices, as
demonstrated below. That arithmetic is both necessary and sufficient
for list encoding was shown by Goedel in the course of proving the
incompleteness theorem. His encoding of number sequences, relying on
exponentiation and factorization, is impractical. (In fact, no
polynomial encoding of an integer list in
We now demonstrate that System T suffices to encode lists in the
higher-order (simply-typed) setting, and that encoding is
polynomial. In fact, it has the same or even better complexity than
the familiar linked lists. The idea is simple: a list is a finite map,
whose domain is
[0..n-1] for some n. Thus the type
tlist of lists
with elements of type
t is simply
tlist = int * (int -> t)The first component of the pair is the length of the list (which determines the domain of the finite map). The second component is the finite map: from the index
n-i-1-th element of the list where
nis list length. Such right-to-left indexing makes many operations faster. Pairs are expressible in System T, as shown earlier. (Below we use the OCaml notation for pairs for clarity.)
The operations of the list interface can then be defined as
let nil : tlist = (0, fun _ -> t_in) let cons : t -> tlist -> tlist = fun x (c,memb) -> (c+1, fun i -> if i = c then x else memb i) let iter : u -> (u -> t -> tlist -> u) -> tlist -> u = fun z f (c,memb) -> recursor z (fun z c -> f z (memb c) (c,memb)) cwhere
recursoris Goedel recursor (the constant
Rof System T), and
t_inis an inhabitant of type
Incidentally, our implementation of lists readily generalizes to streams: potentially infinite lists.
Patrick Cegielski and Denis Richard. On arithmetical first-order theories
allowing encoding and decoding of lists. Theoretical Computer Science,
222(12):55 - 75, 1999.
The paper shows several encodings of lists of natural numbers, including historical encodings by Goedel, Ackermann and Quine. They all require addition and a multiplication-like operation. The paper investigates the decidability of theories permitting encodings/decoding of lists and tuples.