# Simply-typed encodings: PCF considered as unexpectedly expressive programming language

We describe lesser-known and unexpected macro-expressivity: implementations of pairs, sums, lists, trees, objects, generally recursive data types, imperative programs in PCF: simply-typed lambda calculus with integers and fixpoint. These implementations are as `macros' (functions), requiring no global program transformations. They maintain type soundness, introducing no partiality. In fact, many realizations do not require fixpoint and hence hold already in System T or even weaker systems. The implementations introduce no exponential blowup.

## Introduction

Simply-typed lambda calculus is simple and logical in its semantics, which helps in verification and assured program development. It is also computationally deficient: the pure calculus (without constants) cannot even represent integer predecessor, let alone something more complicated as objects.

The common way of raising expressiveness is complicating types. System F, the polymorphic calculus, can represent integer arithmetic -- and via the Boehm-Berarducci (or Goedel) encoding, inductive data structures such as lists and trees. System F is strongly normalizing, maintaining connections with logic -- but its semantics is quite more complicated. Also, Boehm-Berarducci (let alone Goedel) encodings are inherently inefficient. One may also complicate the type system by adding recursive types, which lets us embed untyped lambda calculus and represent anything, and efficiently -- at the cost of even more complicated semantics.

A different direction is to maintain simple types but add constants. Just adding booleans (or integer literals with zero-branching but no other operations on them) lets us define pairs and sums -- the known, but not widely, result. What else can we express if we add successor or addition, etc?

If we add the basic integer data type and the constants for successor, predecessor, branching-on-zero, fixpoint, we get the calculus called PCF. Although its functions are generally partial, it does have a rather straightforward semantics: the famous Scott's LCF, among others. PCF is an extension of the famous Goedel System T, which admitted only primitive recursion on integers. PCF is computationally complete (e.g., it is a model of Kleene's partial recursive functions) and hence can, in principle, represent anything computable. One has to distinguish however an encoding of a program via a global transformation (continuations-passing transformation or an encoding into a Turing machine or partial recursive function) from a local encoding of operations as macros/procedures, in terms of PCF primitives. The present page concerns the latter: macro-expressiveness. It is not to be taken for granted in PCF and, at times, is unexpected. Our encodings are type sound and do not introduce partiality. In fact, many of them hold already in System T or even weaker systems. They encodings also reasonably efficient, with no exponential blow-up.

PCF is a real programming language: it is a subset of a typed functional programming language such as OCaml. All encodings described on the present page have been implemented in OCaml and tested: see the accompanying complete code. We also use the OCaml notation to present PCF code on this page.

The trigger for developing the PCF encodings were much appreciated discussions with Naoki Kobayashi in September 2011, in the context of program verification via Higher-Order Recursion Schemes (HORS). Then used version of HORS could not cope with polymorphic and recursive types; hence simply-typed encodings were sought. Many of the presented encodings were developed at that time. I also thank Sam Lindley and John Longley for literature references.

References
Gordon Plotkin: LCF considered as a programming language
Theoretical Computer Science, 1977, 5(3), 223–255.

## Terminology and basic facts: Encodings, Unions

This warm-up section introduces the terminology, calculi, and the basic facts used throughout the page. A particular useful fact concerns non-discriminated unions: the ones that should be familiar from C. Like in C, they are used to pass around values of several types as if they all had the same type.

```    t ::= int | t -> t
```
We shall use meta-variables `t`, and sometimes `u` and `s`, to refer to simple types.

We will also be dealing with simply-typed lambda-calculi. All of them have variables, lambda-abstractions and applications -- and the constants of the `int` type: the numerals `0`, `1`, etc. The calculi differ in what else they have. The calculus to be called `L0` has nothing else. The calculus `LB` has the family of constants `if0: int -> t -> t -> t` for all types `t`, with the following conversion rules

```    if0 0 e1 e2 = e1      if0 n e1 e2 = e2, for any n other than 0
```
for all terms `e1`, `e2` of the same type. We will write the application `if0 n e1 e2` in a more familiar way: `if n=0 then e1 else e2`. The calculus `LB`, let alone `L0`, supports no other operation on integers: not even the successor. Next is System T, which has successor operation on integers and the family of constants `R: t -> (t->int->t) -> int -> t` (Goedel recursor) with the conversion rules
```    R e1 e2 0 = e1     R e1 e2 n = e2 (R e1 e2 m) m,  where n is a successor of m
```
Now we can do integer arithmetic: addition, multiplication, etc. It is easy to see that the term
```    fun n x y -> R x (fun x' y' -> y) n
```
(where `fun x -> e` denotes a lambda-abstraction) `behaves just like `if0`': it has the same type as `if0` and its application to three arguments converts (using beta- and R-conversion) like `if0`. We say: `if0` is macro-expressible (some say, `definable') in System T (and hence System T subsumes `LB`). The topic of this page is to describe other, more interesting, macro-expressibilities. Since we are only after macro-expressibility, we will often drop `macro'. Finally, there is PCF: it has successor and predecessor as constants, as well as the family of constants `fix: (t->t) -> t` for all types `t`, with the conversion rule
```    fix e = e (fix e)
```
PCF supports (i.e., expresses) all arithmetic on integers; it also expresses `R`. It is also the first calculus with partiality: in the other calculi, any term is convertible to an numeral, when applied to the right number of arguments, of the right type but otherwise arbitrary. Not so in PCF.

In the rest of this section we will be dealing with the calculus `L0`, which is included in all the others. Hence the basic facts described here are the most general.

The first, very general and useful fact, is that every simple type is inhabitable: for any `t` there is a closed term of that type. Indeed, for `int` we have the term `0`. Any other type can be written in the form `t1 -> ... -> int` and is hence inhabitable by a constant function `fun x ... -> 0`. We write `t_in` for an arbitrary inhabitant of the type `t`.

The next useful notion is encoding, or injection/projection pair. Two terms `inj: t -> u` and `prj: u -> t` called injection/projection pair if for all terms `e` of type `t`, `prj (inj e)` can be converted to `e`. We shall often say: the composition `prj . inj` is the identity. In other words, `t` is encodable as `u`, with `inj` providing the encoding and `prj` decoding. The requirement `prj . inj` being the identity then means that what is encoded can be decoded. Mathematically speaking, `t` is called a retract of `u`, and the corresponding pair of terms `inj` and `prj` called the retraction, written `t < u`. Incidentally, if `inj . prj` is also the identity, then `inj` and `prj` are called bijection, and the types `t` and `u` isomorphic. For a general retraction, however, `inj` is not onto: there are some terms of type `u` that are not convertible to any `inj e`, whatever `e` may be be (that is, not everything in `u` is an encoding of some `t`).

A useful fact about retractions (to be called (F) and used soon) is that if `t < u` and `t' < u'` then `(t->t') < (u->u')`. Look closely: it is as simple as that, there is no contra-variance. Therefore, `(t'->t) < (u'->u)` also holds. The proof is simple: if `inj_t, prj_t` is the retraction `t < u` and `inj_t', prj_t'` is the retraction `t' < u'`, then the retraction `(t->t') < (u->u')` is defined as

```    let inj : (t->t') -> (u->u') = fun f -> inj_t' . f . prj_t
let prj : (u->u') -> (t->t') = fun f -> prj_t' . f . inj_t
```
It is easy to verify that `prj . inj` is indeed the identity.

Another useful and interesting fact (to be called (I)) about simple types is that `int` is encodable in any type: `int < t`. The proof is simple induction. The base case `int < int` is obvious. In the inductive case we have to show `int < (t->u)` where `int < u`, with retractions `inj_u` and `prj_u`. Clearly, an `int` can always be turned into a function: the constant function. Thus we have

```    let inj : int -> (t -> u) = fun x -> fun _ -> inj_u x
let prj : (t -> u) -> int = fun f -> prj_u (f t_in)
```
Recall, `t_in` is an inhabitant of `t`. Again, the verification `prj . inj` being the identity is easy.

We come to the main result of this section: union. For any two simple types `t` and `u` there exists a simple type, to be denoted as `t <+> u`, that encodes both: `t < (t <+> u)` and `u < (t <+> u)`. That is, for any two types there is a type that is `big enough' for both.

The proof is inductive and straightforward. It goes without saying that if two types are the same, they are the union: `t <+> t = t`. The retractions are the identities -- quite in the spirit of C. Generalizing, if `t < u` then `t <+> u = u`. Since `int` is encodable in any type (fact (I)), the only case to consider is `(t1 -> t2) <+> (u1 -> u2)`, assuming by induction that there exist `t1 <+> u1` and `t2 <+> u2`. The union is then

```    (t1 -> t2) <+> (u1 -> u2)  =  (t1 <+> u1) -> (t2 <+> u2)
```
The corresponding retractions `(t1 -> t2) < ((t1 <+> u1) -> (t2 <+> u2))` and `(u1 -> u2) < ((t1 <+> u1) -> (t2 <+> u2))` follow from the basic fact (F) about retractions.

In OCaml, the retraction interface can be represented as the signature

```    module type retraction = sig
type t
type u
(* the following must satisfy prj . inj = id *)
val inj : t -> u
val prj : u -> t
end
```
and the union of `t1` and `t2` as
```    module type union = sig
type t1
type t2
type union
module R1 : (retraction with type t = t1 and type u = union)
module R2 : (retraction with type t = t2 and type u = union)
end
```

The code contains the implementation, following the inductive rules above. The next section shows an example.

Version
The current version is December 2021
References
systemT_datatype.ml [5K]
The complete OCaml source code with tests

John Longley and Dag Normann: Higher-Order Computability
In `Theory and Applications of Computability' series, Springer (2015)
The basic fact (F) is Prop. 4.2.1i(3). The basic fact (I) and the existence of the union can be gleaned from Prop. 4.2.3 of the book. That proposition actually deals with more general lambda-algebras and pure types, and a bit more difficult to understand. Its construction is also more complex than needed for our purposes.

John Longley: The encodability hierarchy for PCF types
April 30, 2019 <https://arxiv.org/pdf/1806.00344.pdf>
The first section gives a very good introduction to the area and its terminology.

## Pairs

The just presented encoding of non-discriminated unions induces the encoding on pairs (products). The additional requirement is the constant `if0`. Thus the encoding works in the calculus `LB` and those containing it.

Before getting to the encoding, we have to discuss what the result must be. What is a pair? One answer is the facility with the following interface:

```    type ('a,'b) pair   (* abstract *)
val pair : 'a -> 'b -> ('a,'b) pair
val fst : ('a,'b) pair -> 'a
val snd : ('a,'b) pair -> 'b
```
However, our calculi have no polymorphism and type variables and no way to express even the types in that interface. Still, the above definition of pairs is not wrong, if we understand `'a` and `'b` as schematic variables and the interface as a schema (cf. axiom schemas in first-order logic). Upon such reading, the interface does not prescribe the single function `pair` that takes two arguments of arbitrary types. What is prescribed is that for any two specific types `t1` and `t2`, there must be a type `tpair` and (total) functions `pair_t1t2 : t1 -> t2 -> tpair`, `fst_t1t2 : tpair -> t1`, `snd : tpair -> t2` satisfying the obvious properties
```    fst_t1t2 (pair_t1t2 x y) = x          snd_t1t2 (pair_t1t2 x y) = y
```
for any `x` of type `t1` and `y` of type `t2`.

It may take a moment to realize that since the type of `pair_t1t2 : t1 -> t2 -> tpair` is specific to the particular types `t1` and `t2`, so may be the term `pair_t1t2` itself. In OCaml, Haskell, etc., pairs are polymorphic: there exists one single term (pairing function) that builds pairs of all possible types, uniformly. Such polymorphic pairs is just one implementation choice. The other choice is type-specific pairing: the representation and construction of a pair of `t1` and `t2` is not uniform but specific to the types `t1` and `t2`. (The most well-known type-specific operation is equality: each type has its own way of comparing its values.)

Thus the interface of simply-type pairs is expressed by the following OCaml signature:

```    module type pairs = sig
type t1
type t2
type tpair
val pair : t1 -> t2 -> tpair
val fst : tpair -> t1
val snd : tpair -> t2
end
```
(Not stated explicitly but important are the requirements of being total and satisfying the fst-pair and snd-pair equational laws above.) We demonstrate that an implementation of this interface exists for any simple types `t1` and `t2`.

The implementation is trivial. Recall the encoding of pairs in untyped lambda calculus:

```    let pair x y = fun t -> if t = 0 then x else y
```
It won't work exactly like that in the typed setting because `x` and `y` may have different types. That is where the union comes in:
```    let pair : t1 -> t2 -> (int -> (t1 <+> t2)) = fun x y ->
fun t -> if t = 0 then inj_t1 x else inj_t2 y
let fst : (int -> (t1 <+> t2)) -> t1 = fun p -> prj_t1 (p 0)
let snd : (int -> (t1 <+> t2)) -> t2 = fun p -> prj_t2 (p 1)
```
Since the union `t1 <+> t2` exists for any simple types `t1` and `t2`, so is the pairing. Thus defined `pair`, `fst` and `snd` are clearly total and satisfy the expected equational laws. In the following, we write the type `int -> (t1 <+> t2)` as `t1 * t2`.

Incidentally, it `t < u`, then their product is the simple type `int -> u`.

In OCaml, the construction can be realized as a functor

```    module Pair(U:union) = struct
type t1 = U.t1
type t2 = U.t2
type tpair = int -> U.union
let pair : t1 -> t2 -> tpair = fun x y ->
fun t -> if t = 0 then U.R1.inj x else U.R2.inj y
let fst : tpair -> t1 = fun p -> p 0 |> U.R1.prj
let snd : tpair -> t2 = fun p -> p 1 |> U.R2.prj
end
```

Here is an example. First we build the `union` implementations for two `int` types

```    module R_same_int = RSame(struct type t = int end)
module U_same_int = URetr(R_same_int)   (* URetr builds a union implementation from a retraction *)
```
and the types `int` and `int->int` (from the retraction `int < (int->int)`)
```    module Ui_ii = URetr(Rint(R_same_int)
(struct type ta = int let inhabitant = 0 end))
```
The pair of `int` and `int->int` is then
```    module Pairi_ii = Pair(Ui_ii)
```
that can be used as expected: `Pairi_ii.(pair 0 succ |> fst)` gives `0`, and `Pairi_ii.(pair 0 succ |> snd)` is the successor, which can be verified by checking that `Pairi_ii.(pair 0 succ |> snd) 10` evaluates to `11`. The accompanying code has more examples.
Version
The current version is November 2021
References
systemT_datatype.ml [5K]
The complete OCaml source code with tests

John Longley and Dag Normann: Higher-Order Computability. In `Theory and Applications of Computability' series, Springer (2015)
Product types are discussed in Sec 4.2.2, but in a far more general setting and different aims. Our pair construction is a particular case of the categorical construction in Prop. 4.2.5.

## Sums

Sums are discriminated unions. To make a non-discriminated union (described earlier) into discriminated we pair it with the discriminator -- quite like discriminated (tagged) unions in C. Like pairs, sums are thus implementable already in `LB`.

The interface of sums is expressed by the following OCaml signature

```    module type sums = sig
type t1
type t2
type tsum
val inl : t1 -> tsum
val inr : t2 -> tsum
val is_left  : tsum -> bool
val is_right : tsum -> bool
val left  : tsum -> t1
val right : tsum -> t2
end
```
(In `LB`, a boolean is represented as an integer: 0 is `false` and non-zero is `true` -- like in C.) These functions must satisfy the equational laws
```    is_left  (inl x) === true     left  (inl x) === x
is_right (inr x) === true     right (inr x) === x     is_left === not . is_right
```
Strictly speaking, `left` and `right` operations do not have to be total. They are in our implementation: `left (inr x)` does not crash and does return some value. The implementation is straightforward: pairing the union with the discriminator, 0 for left and 1 for the right alternative.
```    module Sum(U:union)(P:pairs with type t1=int and type t2=U.union) = struct
type t1 = U.t1
type t2 = U.t2
type tsum = P.tpair
let inl : t1 -> tsum = fun x -> U.R1.inj x |> P.pair 0
let inr : t2 -> tsum = fun x -> U.R2.inj x |> P.pair 1
let is_left  : tsum -> bool = fun x -> P.fst x = 0
let is_right : tsum -> bool = fun x -> P.fst x = 1
let left  : tsum -> t1 = fun x -> P.snd x |> U.R1.prj
let right : tsum -> t2 = fun x -> P.snd x |> U.R2.prj
end
```
It is just as clear that the required properties of these operations are satisfied. In the following, we write the sum type of `t1` and `t2` as `t1 + t2`.

A different encoding of sums is suggested (as an exercise) in the Longley and Normann's book. Since all simple types are inhabitable, sums can be represented as products: a sum value `t1 + t2` pairs the `t1` value with an arbitrary inhabitant of `t2`. We also need a discriminator, `0` or `1`, to tell which of the two components of the pair contains the `valid' value. All in all, Longley and Normann represent `t1+t2` as `int * t1 * t2`. This encoding is more complicated than ours, but it has a nice special case: option types. They can be realized hence as

```    type toption  = int * t
let none : toption      = pair 0 t_in
let some : t -> toption = pair 1
```
Version
The current version is November 2021
References
systemT_datatype.ml [5K]
The complete OCaml source code with tests

John Longley and Dag Normann: Higher-Order Computability. In `Theory and Applications of Computability' series, Springer (2015)
Prop. 4.2.11.

## Sequences: lists and arrays

The simplest form of sequences is n-tuples, which are easily expressible in `LB` by iterating the pair construction (as was noted already by Cantor). Lists are more complex. The size of an n-tuple is known in advance; therefore, we may define tuple constructors and accessors for this specific size. In contrast, each list value is a sequence of its own size. List operations should be able to handle any lists whatever their size. Correspondingly, whereas n-tuples may have elements of various types, lists, in the simply typed setting, have to have elements of the same type.

What are lists, exactly? First comes to mind is the familiar (from Lisp) interface of `nil`, `cons`, `is_empty`, `head` and `tail` operations. However, when we try to implement, say, list reversal or concatenation, we realize something else is needed: the ability to repeatedly apply `is_empty` and `cons` statically unknown number of times. Thus is, we need recursion, and hence calculus as rich as PCF.

On further thought, we notice that lists are the generalization of natural numbers: think of unary numerals. The list interface then comes along the lines of System T: for all simple types `t` there exists a simple type `tlist` with the following constants

```    nil  : tlist
cons : t -> tlist -> tlist
iter : u -> (u -> t -> tlist -> u) -> t -> u
```
with `nil` playing the role of `0`, `cons` the role of successor, and `iter` the role of Goedel recursor R (the type `u` is an arbitrary simple type). It is easy to see that `head`, `tail`, `is_empty`, `reverse`, `append`, `zip`, and all other familiar operations on lists are implementable using this interface. (See also the accompanying code).

Which calculi can implement this list interface is an interesting question, investigated in detail in Cegielski and Richard's paper. One important answer to it was given already by Goedel: any calculus that has list decoding/encoding and natural number comparison can implement any primitive recursive function -- and hence full arithmetic. Its theory is therefore undecidable. That a theory of lists includes arithmetic should not be too surprising: after all, lists are a generalization of natural numbers. Therefore, we need at least System T to implement lists. In fact, System T actually suffices, as demonstrated below. That arithmetic is both necessary and sufficient for list encoding was shown by Goedel in the course of proving the incompleteness theorem. His encoding of number sequences, relying on exponentiation and factorization, is impractical. (In fact, no polynomial encoding of an integer list in `int` exists.)

We now demonstrate that System T suffices to encode lists in the higher-order (simply-typed) setting, and that encoding is polynomial. In fact, it has the same or even better complexity than the familiar linked lists. The idea is simple: a list is a finite map, whose domain is `[0..n-1]` for some n. Thus the type `tlist` of lists with elements of type `t` is simply

```    tlist = int * (int -> t)
```
The first component of the pair is the length of the list (which determines the domain of the finite map). The second component is the finite map: from the index `i` to the `n-i-1`-th element of the list where `n` is list length. Such right-to-left indexing makes many operations faster. Pairs are expressible in System T, as shown earlier. (Below we use the OCaml notation for pairs for clarity.)

The operations of the list interface can then be defined as

```    let nil : tlist = (0, fun _ -> t_in)

let cons : t -> tlist -> tlist = fun x (c,memb) ->
(c+1, fun i -> if i = c then x else memb i)

let iter : u -> (u -> t -> tlist -> u) -> tlist -> u = fun z f (c,memb) ->
recursor z (fun z c -> f z (memb c) (c,memb)) c
```
where `recursor` is Goedel recursor (the constant `R` of System T), and `t_in` is an inhabitant of type `t`.

Incidentally, our implementation of lists readily generalizes to streams: potentially infinite lists.

Version
The current version is January 2022
References
systemT_list.ml [3K]
The complete accompanying code with tests

Patrick Cegielski and Denis Richard. On arithmetical first-order theories allowing encoding and decoding of lists. Theoretical Computer Science, 222(12):55 - 75, 1999.
The paper shows several encodings of lists of natural numbers, including historical encodings by Goedel, Ackermann and Quine. They all require addition and a multiplication-like operation. The paper investigates the decidability of theories permitting encodings/decoding of lists and tuples.