From oleg@pobox.com Sun Oct 18 13:57:40 1998 From: oleg@pobox.com Subject: Expressing formal proofs in a computer language: Y Scheme Date: Sun, 18 Oct 1998 20:58:45 GMT Reply-To: oleg@pobox.com Keywords: algorithm analysis, priority queue, formal proof, complexity, Scheme Newsgroups: comp.lang.scheme,comp.theory,sci.logic Organization: Deja News - The Leader in Internet Discussion Summary: Scheme as a language to conduct formal proofs of algorithms expressed in Scheme X-Article-Creation-Date: Sun Oct 18 20:58:45 1998 GMT X-Http-Proxy: 1.0 x2.dejanews.com:80 (Squid/1.1.22) for client 192.16.167.17 X-Http-User-Agent: Mozilla/4.07 (Macintosh; I; PPC, Nav) Sender: www@dejanews.com Content-Length: 5015 Status: RO This article is to talk about languages to talk about languages: how do we reason about meaning of a phrase: a piece of code. How do we go about proving correctness of algorithm's implementation? Which words do we use? ML has a wonderful property: it is a language that easily expresses reasoning about its own devices. This article is intended to show that Scheme can claim the same fame: Scheme can be used to spell out and conduct formal proofs of algorithms written in Scheme, thus Scheme is its own meta-language as well. To start with, how do we normally reason about algorithms and programs? Here are two random examples. - The FFT algorithm. The original paper by Cooley and Tukey derived one of the most beautiful algorithms in standard mathematical notation, with many a sum sign and two-three-level subscripts. Any FFT implementation is to be judged correct if it faithfully computes all the sums and products specified in the paper in the right order. - Dijkstra's shortest-path algorithm. The algorithm is explained by drawing graphs, sets of already reached vertices etc. on a piece of paper. The algorithm is formally proved using a set-theoretical notation. Again, for an implementation to be correct, it should just "follow the proof". These typical examples show that normally, the regular algebraic/set-theoretical language is employed to write out propositions and carry out constructive proofs. Another - an algorithmic -- language is then use to encode these constructions for a computer. There seems to be a more direct way, when the _same language_ is used throughout: to express an algorithm, to formulate propositions about the algorithm and its correctness, and to conduct proofs of these propositions. The other day I attempted to formally _prove_ space and time complexities of a particular Scheme code that builds a priority queue (heap). I claimed that a data structure -- a binary tree -- being built by the algorithm is skewed given an unfortunate set of input data. This conjecture had to be proved, of course. As this article is concerned only with the form of the proof, its substance will not be discussed here. The complete proof was posted on comp.lang.scheme and can also be found in: http://pobox.com/~oleg/ftp/Scheme/priority-queue-complexity.txt I started in the "traditional" way, by drawing the trees for some particular cases. But then I realized that I could succinctly represent my conjecture as: (PQ n) = (list 1 '(2 () ()) (PQ-map plus2 (PQ (- n 2))), n>2 (PQ 1) = '(1 () ()) (PQ 2) = '(1 (2 () ()) ()) If I wrote this in a journal paper, I'd spell (PQ n) as PQn defined as a priority queue holding elements 1 through n, added in order. I would've used fancier symbols for PQ-map, plus2, and maybe even draw the 'list' clause as a tree node. The claim above would then look like the one in a normal mathematical notation. There is a dramatic difference however. (PQ n) and (PQ-map) are defined as (define (PQ k) (if (zero? k) priqueue-nil (priqueue-insert k (PQ (- k 1))))) (define (PQ-map f pq) (if (null? pq) '() (list (f (car pq)) (PQ-map f (cadr pq)) (PQ-map f (caddr pq))))) and are _valid Scheme functions_. priqueue-insert is the the function under investigation, the one that builds the heap and whose property I tried to prove. Thus the conjecture is a valid scheme code, and can be evaluated, for particular values of n of course. Which I _actually_ did. Before embarking on a formal proof, I wanted to do a quick spot check; just to make sure that my contention isn't an outright lie. So I evaluated: (equal? (PQ 42) (list 1 '(2 () ()) (PQ-map plus2 (PQ 40)))) and the Scheme interpreter said #t. I went on with the proof by induction, in which I manually traced the execution of (priqueue-insert (+ k 1) (PQ k)) assuming that (PQ k) has a particular form above. In essence I "partially" or "symbolically" evaluated (PQ (+ k 1)) with k being a "free" identifier -- an induction variable. By tracing the code -- following control, binding variables, making judgments at branching points and making sure the judgments are provable -- I did most of the job of a Scheme interpreter. As the statement of the theorem is written in Scheme, and the proof is being conducted in Scheme, can a Scheme interpreter be gainfully employed? Of course the interpreter will need human guidance at judgment points, in particular, when it encounters a "free identifier". This question did occur to me, but I did not answer it. Besides, this article is not about formal theorem provers. It is about one example how an _algorithmic_ language was used to express and carry out _formal_ proofs. One can really *think* in Scheme. This may offer advantages of calling upon a Scheme interpreter for assistance in some particular, or more formal, cases. I must apologize that this article ends without offering any more positive or general conclusion. It was merely a contemplation: reflections often are.