From oleg@pobox.com Sun Oct 18 13:57:40 1998
From: oleg@pobox.com
Subject: Expressing formal proofs in a computer language: Y Scheme
Date: Sun, 18 Oct 1998 20:58:45 GMT
Reply-To: oleg@pobox.com
Keywords: algorithm analysis, priority queue, formal proof, complexity, Scheme
Newsgroups: comp.lang.scheme,comp.theory,sci.logic
Organization: Deja News - The Leader in Internet Discussion
Summary: Scheme as a language to conduct formal proofs of algorithms expressed in Scheme
X-Article-Creation-Date: Sun Oct 18 20:58:45 1998 GMT
X-Http-Proxy: 1.0 x2.dejanews.com:80 (Squid/1.1.22) for client 192.16.167.17
X-Http-User-Agent: Mozilla/4.07 (Macintosh; I; PPC, Nav)
Sender: www@dejanews.com
Content-Length: 5015
Status: RO

This article is to talk about languages to talk about languages: how
do we reason about meaning of a phrase: a piece of code. How do we go
about proving correctness of algorithm's implementation? Which words
do we use? ML has a wonderful property: it is a language that easily
expresses reasoning about its own devices. This article is intended to
show that Scheme can claim the same fame: Scheme can be used to spell
out and conduct formal proofs of algorithms written in Scheme, thus
Scheme is its own meta-language as well.

To start with, how do we normally reason about algorithms and
programs? Here are two random examples.
- The FFT algorithm. The original paper by Cooley and Tukey derived
one of the most beautiful algorithms in standard mathematical
notation, with many a sum sign and two-three-level subscripts. Any FFT
implementation is to be judged correct if it faithfully computes all
the sums and products specified in the paper in the right order.
- Dijkstra's shortest-path algorithm. The algorithm is explained by
drawing graphs, sets of already reached vertices etc. on a piece of
paper. The algorithm is formally proved using a set-theoretical
notation. Again, for an implementation to be correct, it should just
"follow the proof".

These typical examples show that normally, the regular
algebraic/set-theoretical language is employed to write out
propositions and carry out constructive proofs. Another - an
algorithmic -- language is then use to encode these constructions for
a computer. There seems to be a more direct way, when the _same
language_ is used throughout: to express an algorithm, to formulate
propositions about the algorithm and its correctness, and to conduct
proofs of these propositions.

The other day I attempted to formally _prove_ space and time
complexities of a particular Scheme code that builds a priority queue
(heap). I claimed that a data structure -- a binary tree -- being
built by the algorithm is skewed given an unfortunate set of input
data. This conjecture had to be proved, of course. As this article is
concerned only with the form of the proof, its substance will not be
discussed here. The complete proof was posted on comp.lang.scheme and
can also be found in:
        http://pobox.com/~oleg/ftp/Scheme/priority-queue-complexity.txt

I started in the "traditional" way, by drawing the trees for some
particular cases. But then I realized that I could succinctly
represent my conjecture as:
  (PQ n) = (list 1 '(2 () ()) (PQ-map plus2 (PQ (- n 2))), n>2
  (PQ 1) = '(1 () ())
  (PQ 2) = '(1 (2 () ()) ())

If I wrote this in a journal paper, I'd spell (PQ n) as PQ<sub>n</sub>
defined as a priority queue holding elements 1 through n, added in
order. I would've used fancier symbols for PQ-map, plus2, and maybe
even draw the 'list' clause as a tree node. The claim above would then
look like the one in a normal mathematical notation. There is a
dramatic difference however. (PQ n) and (PQ-map) are defined as

(define (PQ k)
  (if (zero? k) priqueue-nil
    (priqueue-insert k (PQ (- k 1)))))

(define (PQ-map f pq)
  (if (null? pq) '()
   (list (f (car pq)) (PQ-map f (cadr pq)) (PQ-map f (caddr pq)))))

and are _valid Scheme functions_. priqueue-insert is the the function
under investigation, the one that builds the heap and whose property I
tried to prove. Thus the conjecture is a valid scheme code, and can be
evaluated, for particular values of n of course.  Which I _actually_
did. Before embarking on a formal proof, I wanted to do a quick spot
check; just to make sure that my contention isn't an outright lie. So
I evaluated:
        (equal?
          (PQ 42)
          (list 1 '(2 () ()) (PQ-map plus2 (PQ 40))))

and the Scheme interpreter said #t. I went on with the proof by
induction, in which I manually traced the execution of
(priqueue-insert (+ k 1) (PQ k)) assuming that (PQ k) has a particular
form above. In essence I "partially" or "symbolically" evaluated (PQ
(+ k 1)) with k being a "free" identifier -- an induction variable. By
tracing the code -- following control, binding variables, making
judgments at branching points and making sure the judgments are
provable -- I did most of the job of a Scheme interpreter. As the
statement of the theorem is written in Scheme, and the proof is being
conducted in Scheme, can a Scheme interpreter be gainfully employed?
Of course the interpreter will need human guidance at judgment points,
in particular, when it encounters a "free identifier". This question
did occur to me, but I did not answer it. Besides, this article is not
about formal theorem provers. It is about one example how an
_algorithmic_ language was used to express and carry out _formal_
proofs. One can really *think* in Scheme. This may offer advantages of
calling upon a Scheme interpreter for assistance in some particular,
or more formal, cases.

I must apologize that this article ends without offering any more
positive or general conclusion. It was merely a contemplation:
reflections often are.