From posting-system@google.com Fri Sep 19 06:39:19 2003 Date: Thu, 18 Sep 2003 23:39:02 -0700 From: oleg@pobox.com (oleg@pobox.com) Newsgroups: comp.lang.scheme Subject: Self-application as the fixpoint of call/cc Message-ID: <7eb8ac3e.0309182239.5a64b3b1@posting.google.com> Status: OR We shall show in which way (call/cc call/cc) is equivalent to (lambda (x) (x x)). We shall show how to express Y via call/cc. The result has a distinctive feature: Y can be expressed *without* resorting to an explicit self-application. We shall also illustrate the use of syntax-rules in proving theorems. Theorem 1. Expression - ((call/cc ... (call/cc call/cc)) p) - ((call/cc ... (call/cc (call/cc id))) p) - ((lambda (x) (x x)) p) (where p is a value) have beta-equivalent CBV CPS transforms. Here (call/cc ...) signify zero or more applications of call/cc, and id is the identity function (lambda (x) x). In other words, - (lambda (p) ((call/cc ... (call/cc call/cc)) p)) - (lambda (p) ((call/cc ... (call/cc (call/cc id))) p)) - (lambda (p) ((lambda (x) (x x)) p)) are CBV observationally equivalent. Illustration (using the higher-order syntax): (define (p x) (if (eq? x p) '(p p) `(p ,x))) ((lambda (x) (x x)) p) evaluates===> (p p) ((call/cc call/cc) p) evaluates===> (p p) ((call/cc (call/cc call/cc)) p) evaluates===> (p p) ((call/cc (call/cc (call/cc (call/cc (call/cc (lambda (x) x)))))) p) evaluates===> (p p) Catchy phrase 1. self-application is the fixpoint of call/cc Corollary 1. Y combinator via call/cc -- Y combinator without an explicit self-application. (define (Y f) ((lambda (u) (u (lambda (x) (lambda (n) ((f (u x)) n))))) (call/cc (call/cc (lambda (x) x))))) Here, we used a fact that ((lambda (u) (u p)) (call/cc call/cc)) and ((lambda (u) (u p)) (lambda (x) (x x))) are observationally equivalent. Corollary 2. factorial via call/cc (define fact (lambda (f) (lambda (n) (if (<= n 0) 1 (* n (f (- n 1))))))) then, given the above definition of Y, (list ((Y fact) 5) ((Y fact) 6) ((Y fact) 7)) does return '(120 720 5040) Additional reference: http://google.com/groups?selm=7eb8ac3e.0302131759.4735faf3%40posting.google.com To avoid possible misunderstandings, it's important to realize what the theorem says, and specifically what it does not say. The theorem says that ((call/cc call/cc) p) and ((lambda (x) (x x)) p) are observationally equivalent in CBV provided that p is a value. The theorem emphatically does not say that (call/cc call/cc) and (lambda (x) (x x)) are observationally equivalent -- because they are generally not. The equivalence holds only in specific contexts. One of them is an an application to a value. The lemmas below should make that clear. Here are a few examples: (define (foo self) (if (procedure? self) (self #f) (begin (display "!") (newline)))) (define (exp1) (let ((f (call/cc call/cc))) (f (begin (display "OK") foo)))) (define (exp2) (let ((f (lambda (x) (x x)))) (f (begin (display "OK") foo)))) (define (exp3) (let ((f (lambda (y) ((call/cc call/cc) y)))) (f (begin (display "OK") foo)))) Evaluating (exp1) prints OKOK! whereas evaluating (exp2) and (exp3) prints just OK!. Clearly, (exp1) can't be equivalent to (exp2). That is, eta-expansion is not a sound operation when applied to (call/cc call/cc). First-class continuations can be tricky. That's why it helps to be formal. Proof of Theorem 1. We will be using the CPS transformation, the equations for which are taken verbatim from Danvy and Filinski "Abstracting Control" (p. 2). CPS is a source-to-source translation. Therefore, it behooves us to use Scheme's syntax-transformers. (define-syntax CPS (syntax-rules (lambda call/cc p) ((CPS (?e1 ?e2)) ; application (lambda (k) ((CPS ?e1) (lambda (f) ((CPS ?e2) (lambda (a) ((f a) k))))))) ((CPS (lambda (x) ?e)) ; abstraction (lambda (k) (k (lambda (x) (CPS ?e))))) ((CPS call/cc) (lambda (k0) (k0 (lambda (p) (lambda (k) ((p (lambda (a) (lambda (k1) (k a)))) k)))))) ((CPS p) ; skolem constant for *any* value (lambda (k) (k pv))) ((CPS ?x) (lambda (k) (k ?x))))) We will be using Petite Chez Scheme, which provides a form (expand e) to macro-expand an expression e. For example, > (expand '(CPS (lambda (x) (x x)))) (lambda (#:k) (#:k (lambda (#:x) (lambda (#:k) ((lambda (#:k) (#:k #:x)) (lambda (#:f) ((lambda (#:k) (#:k #:x)) (lambda (#:a) ((#:f #:a) #:k))))))))) We can see the problem -- there are too many lambdas, some of which could be reduced. They are administrative lambdas. Because all our expressions here will assuredly terminate, we can use the full normalization, see Appendix. The Appendix indeed defines the normal-order lambda calculator. The particular algorithm (normalization as a yacc-style parsing) is described in great detail in http://pobox.com/~oleg/ftp/Haskell/Lambda_calc.lhs > (expand `(NORM ,(expand '(CPS (lambda (x) (x x)))))) (lambda (#:k) (#:k (lambda (#:x) (lambda (#:k) (#:x #:x #:k))))) which is far smaller a term to look at. It indeed matches the derivation by hand. However, this nested `expand' isn't nice, let alone cumbersome to type. We can integrate NORM into the CPS itself (doing essentially deforestation). (define-syntax CPS (syntax-rules (lambda call/cc p) ((CPS (?e1 ?e2) . args) ; application (NORM (lambda (k) ((CPS ?e1) (lambda (f) ((CPS ?e2) (lambda (a) ((f a) k)))))) . args)) ((CPS (lambda (x) ?e) . args) ; abstraction (NORM (lambda (k) (k (lambda (x) (CPS ?e)))) . args)) ((CPS call/cc . args) (NORM (lambda (k0) (k0 (lambda (p) (lambda (k) ((p (lambda (a) (lambda (k1) (k a)))) k))))) . args)) ((CPS p . args) ; skolem constant for *any* value (NORM (lambda (k) (k pv)) . args)) ((CPS ?x . args) (NORM (lambda (k) (k ?x)) . args)))) (define-syntax NORM (syntax-rules (lambda CPS) ((NORM t) (NORM t () ())) ((NORM (CPS e) env stack) (CPS e env stack)) ((NORM (lambda (x) e) env ()) (let-syntax ((ren (syntax-rules () ((ren ?x ?e ?env) (lambda (x) (NORM ?e ((?x () x) . ?env) ())))))) (ren x e env))) ((NORM (lambda (x) b) env ((enve e) . stack)) (NORM b ((x enve e) . env) stack)) ((NORM (e1 e2) env stack) (NORM e1 env ((env e2) . stack))) ((NORM x () ()) x) ((NORM x () ((enve e) ...)) (x (NORM e enve ()) ...)) ((NORM x env stack) (let-syntax ((find (syntax-rules (x) ((find ?x ((x ?envs ?es) . _) ?stack) (NORM ?es ?envs ?stack)) ((find ?x (_ . ?env) ?stack) (NORM ?x ?env ?stack))))) (find x env stack))) )) > (expand `(CPS (lambda (x) (x x)))) (lambda (#:k) (#:k (lambda (#:x) (lambda (#:k) (#:x #:x #:k))))) Now we can see the results in any Scheme system that has even the second-class macro-expander (e.g., Scheme48). We have added one rule to NORM, to force unexpanded CPS... Incidentally, we could distinguish between administrative and real lambdas. The administrative are the ones introduced by the CPS macro. We may choose a different name for those lambdas. The normalizer will reduce only those, and treat real lambdas as black boxes. But we will not do that here, because for our theorem, we need to reduce some `serious' lambdas as well. We use this CPS as a proof assistant anyway: if it takes too long, there is always Control-C. Lemma 1. CPS transform of ((lambda (x) (x x)) p) is (lambda (k) (pv pv k)) Proof: > (expand '(CPS ((lambda (x) (x x)) p))) (lambda (#:k) (pv pv #:k)) Lemma 2. CPS transform of (call/cc call/cc) is (lambda (k) (k (lambda (a) (lambda (k1) (k a))))) Proof: > (expand '(CPS (call/cc call/cc))) (lambda (#:k) (#:k (lambda (#:a) (lambda (#:k1) (#:k #:a))))) It indeed matches the derivation by hand. Lemma 3. CPS transform of (call/cc (call/cc call/cc)) is the same as that of (call/cc call/cc). Proof: > (expand '(CPS (call/cc (call/cc call/cc)))) (lambda (#:k) (#:k (lambda (#:a) (lambda (#:k1) (#:k #:a))))) Lemma 4. CPS transform of (call/cc (call/cc id)) is the same as that of (call/cc call/cc). Proof: > (expand '(CPS (call/cc (call/cc (lambda (u) u))))) (lambda (#:k) (#:k (lambda (#:a) (lambda (#:k1) (#:k #:a))))) Lemma 5. CPS transform of ((call/cc call/cc) p) is the same as that of ((lambda (x) (x x)) p) Proof: > (expand '(CPS ((call/cc call/cc) p))) (lambda (#:k) (pv pv #:k)) and see Lemma 1. Additional pairs of equivalences: (expand '(CPS (lambda (u) ((lambda (x) (x x)) u)))) (expand '(CPS (lambda (u) ((call/cc call/cc) u)))) The common transform is: (lambda (#:k) (#:k (lambda (#:u) (lambda (#:k) (#:u #:u #:k))))) (expand '(CPS ((lambda (u) (u p)) (call/cc call/cc)))) (expand '(CPS ((lambda (u) (u p)) (lambda (x) (x x))))) The common transform is: (lambda (#:k) (pv pv #:k)) Proof of the main theorem. Let p be a term, M be ((call/cc call/cc) p) and N be ((lambda (x) (x x)) p). By Plotkin's simulation theorem, CPS_transform_v{eval_v{M}} = eval_v{CPS_transform{M} (lambda (x) x)} where '=' means observational equivalence. By Lemma 5, CPS_transform{M} is identical to CPS_transform{N}. Therefore, CPS_transform_v{eval_v{M}} = CPS_transform_v{eval_v{N}} QED. The proof of the `fixpoint' follows from Lemmas 2-4 under trivial induction. In more detail about the observational equivalence: if C[] is a CBV context, and N is a term and C[N] -*>v V (term C[N] reduces to a value in CBV), then (CPS_transform{C[N]} \x.x) evaluates to a value under any strategy. Conversely, if C[N] does not evaluate to a value under CBV, neither will (CPS_transform{C[N]} \x.x) evaluate to one under any strategy (see Theorem 2 of Danvy/Filinski). Note that CPS_transform{C[N]} = CPS_transform{N} k (see the equation at the end of page 9 and beginning page 10, Section 2.5 of Danvy/Filinski). So, terms that have identical CPS transforms are CBV observationally equivalent (that is, have the same terminating behavior when plugged into any CBV context). ---- Appendix. The Lambda-calculator as a simple Scheme syntax-rule macro. The first attempt: ;(define-syntax NORM ; (syntax-rules () ; ((NORM . args) ; (mtrace (NORM1 . args))))) (define-syntax NORM (syntax-rules (lambda) ((NORM e) (NORM e ())) ((NORM (lambda (x) e) ()) (lambda (x) (NORM e ()))) ((NORM t ((lambda x) . stack)) ; lambda as a high-priority application (NORM (lambda (x) t) stack)) ((NORM (lambda (x) b) (e . stack)) ; redex (letrec-syntax ((ren (syntax-rules () ((ren ynew y z ?x ?e ?stack) (let-syntax ((alpha (syntax-rules () ((alpha y ??e) (beta z ?x ??e ((lambda y) . ?stack)))))) (alpha ynew ?e))))) (beta (syntax-rules (lambda x) ((beta x ?x ?e ?stack) (NORM ?e ?stack)) ((beta (lambda (x) z) ?x ?e ?stack) (NORM (lambda (?x) z) ?stack)) ((beta (lambda (y) z) ?x ?e ?stack) (let-syntax ((gensym (syntax-rules () ((gensym . args) (ren y . args))))) (gensym y z ?x ?e ?stack))) ((beta (?e1 ?e2) ?b ?e ?stack) (NORM (((lambda (x) ?e1) ?e) ((lambda (x) ?e2) ?e)) ?stack)) ((beta y ?x ?e ?stack) (NORM y ?stack))))) (beta b x e stack))) ((NORM (e1 e2) stack) (NORM e1 (e2 . stack))) ((NORM x ()) x) ((NORM x stack) (unwind x stack)) )) (define-syntax unwind (syntax-rules () ((unwind t (t1 ...)) (t (NORM t1 ()) ...)))) Some tests: ,expand (NORM ((lambda (x) (x x)) (lambda (y) (y z)))) ==> '(z z) ,expand (NORM ((lambda (x) (lambda (x) x)) 1)) ==> (lambda (x) x) ,expand (NORM ((lambda (x) (lambda (y) (((f x) y) y))) (g y))) ==> (lambda (z) (f (g y) z z)) ,expand (NORM (lambda (a) ((lambda (x) (lambda (a) (a x))) (a x)))) ==> (lambda (a) (lambda (a1) (a1 (a x)))) ,expand (NORM (lambda (a) ((lambda (x) (lambda (a) (x a))) a))) ==> (z^z), ,expand (NORM (lambda (a) ((lambda (x) (lambda (b) (x a))) a))) => (a^b^a#a) ,expand (NORM ((((lambda (a) (lambda (x) (lambda (a) (a x)))) (a x)) 1) 2)) ==> (2 1) ,expand (NORM (((lambda (a) (lambda (x) (a (x a)))) 1) 2)) ==> (1 (2 1)) ,expand (NORM (((lambda (a) ((lambda (x)(lambda (a) (a x))) (a x))) 1) list)) ==> (list (1 x)) A good test of hygiene ,expand (NORM (((lambda (a) ((lambda (x) (lambda (a) (x a))) a)) list) 1)) ==> (list 1) and evaluates to (1) Show it is really a normal-order evaluator ,expand (NORM (((lambda (f) (lambda (x) x)) ((lambda (x) (x x)) (lambda (x) (x x)))) 1)) ==> 1 ,expand (NORM ((lambda (c) (lambda (f) (lambda (x) (f ((c f) x))))) (lambda (f) f))) ,expand (NORM ((((lambda (c) (lambda (f) (lambda (x) (f ((c f) x))))) (lambda (f) f)) list) 0)) (((NORM ((lambda (c) (lambda (f) (lambda (x) (f ((c f) x))))) (lambda (f) f))) (lambda (u) (+ 1 u))) 0) ==> 2 That normalizer is sound. However, the lazy substitution (((lambda (x) ?e1) ?e) ((lambda (x) ?e2) ?e)) replicates 'e' and builds bigger and bigger terms. Chances are, ?e1 may have no occurrence of 'x', in that case, carrying '?e' around is just a waste. As it turns out, for complex expressions, the normalizer quickly runs out of memory. The problem is not that we're lazy in doing substitutions. The problem is that we are not lazy enough. The second attempt: (define-syntax NORM (syntax-rules (lambda) ((NORM t) (NORM t () ())) ((NORM (lambda (x) e) env ()) (let-syntax ((ren (syntax-rules () ((ren ?x ?e ?env) (lambda (x) (NORM ?e ((?x () x) . ?env) ())))))) (ren x e env))) ((NORM (lambda (x) b) env ((enve e) . stack)) (NORM b ((x enve e) . env) stack)) ((NORM (e1 e2) env stack) (NORM e1 env ((env e2) . stack))) ((NORM x () ()) x) ((NORM x () ((enve e) ...)) (x (NORM e enve ()) ...)) ((NORM x env stack) (let-syntax ((find (syntax-rules (x) ((find ?x ((x ?envs ?es) . _) ?stack) (NORM ?es ?envs ?stack)) ((find ?x (_ . ?env) ?stack) (NORM ?x ?env ?stack))))) (find x env stack))) )) We notice that NORM has two 'reductions': - expressions are reduced with respect to the stack - variables are reduced with respect to the environment We can see that we indeed: - find the left-most innermost context for each redex - beta reductions are done correctly (bound variables are effectively renamed) Now, let's investigate (shift shift), etc. We define shift to have the same signature as call/cc: (define (shift* p) (shift f (p f))) (define-syntax CPS (syntax-rules (lambda call/cc shift*) ((CPS (?e1 ?e2) . args) ; application (NORM (lambda (k) ((CPS ?e1) (lambda (f) ((CPS ?e2) (lambda (a) ((f a) k)))))) . args)) ((CPS (lambda (x) ?e) . args ) ; abstraction (NORM (lambda (k) (k (lambda (x) (CPS ?e)))) . args)) ((CPS call/cc . args) (NORM (lambda (k0) (k0 (lambda (p) (lambda (k) ((p (lambda (a) (lambda (k1) (k a)))) k))))) . args)) ((CPS shift* . args) (NORM (lambda (k0) (k0 (lambda (p) (lambda (c) ((p (lambda (x) (lambda (c2) (c2 (c x))))) (lambda (v) v)))))) . args)) ((CPS ?x . args) (NORM (lambda (k) (k ?x)) . args)))) > (expand '(CPS (shift* shift*))) (lambda (#:k) (#:k (lambda (#:x) (lambda (#:c2) (#:c2 #:x))))) > (expand '(CPS ((shift* shift*) p))) (lambda (#:k) (#:k p)) > (expand '(CPS ((lambda (x) x) p))) (lambda (#:k) (#:k p)) So, (shift* shift*) is the identity function.