Self-quoting fixpoints: Quines, in Scheme and MetaOCaml

 

 

Quine as a fixpoint of an interpreter

Quines are ``self-reproducing programs'' -- or programs that output the copy of themselves.

Let IN be an interpreter, or evaluator: a function that takes an encoding E[t] of a term t to t's value. Then a quine q is a term such that its value is its encoding:

    	    IN E[q] = E[q]
from which follows that a (quoted) quine is a fixpoint of the interpreter. It is not the least fixpoint: since the interpreter is strict, its least fixpoint is bottom. Let us see the connection between quines and fixpoints in more detail, in Scheme and MetaOCaml.

 

Quines in Scheme

Most Scheme systems offer the interpreter as a primitive: eval. (On some Scheme systems, eval takes two arguments: the (quoted) expression to evaluate and the environment. For our purposes, take the second argument to be r5rs-environment.) The shortest fixpoint of eval is 1 (the numeral one). Indeed, (eval 1) is 1. The same holds for other numbers, as well as characters, strings, and booleans. These are called `self-evaluating data' -- which, come to think of it, is the synonym for the eval's fixpoint.

Can we find a fixpoint of eval that actually `does something'? The answer comes from the expression for the core of the fixpoint combinator (so-called ``the omega term''):

    ((lambda (f) (f f)) (lambda (f) (f f)))
One-step reduction of this term gives the same term. For a quine, however, we need a term whose reduction gives the quoted version of it (earlier denoted as E[q]). This is not difficult to arrange by putting quotes and quasi-quotes in the right places:
    (define ff ((lambda (f) `(,f ',f)) '(lambda (f) `(,f ',f))))

Indeed,

    > ff
    ((lambda (f) `(,f ',f)) '(lambda (f) `(,f ',f)))
    > (eval ff)
    ((lambda (f) `(,f ',f)) '(lambda (f) `(,f ',f)))
    > (equal? ff (eval ff))
    #t

The repeated re-evaluations of ff are similar to the stepping through the evaluation of the omega term.

We can re-write ff using only quasi-quote. That is not entirely trivial because now we have to mind the levels of quotation and quasi-quotation. But that expression will be more useful in other staged languages, which only have quasi-quotation and which insists on properly matching unquotes with quasi-quotes.

    (define ff1
       ((lambda (g) `(,g (,`quasiquote ,g))) 
          `(lambda (g) `(,g (,`quasiquote ,g)))))


    > ff1
      ((lambda (g) `(,g (,`quasiquote ,g)))
        `(lambda (g) `(,g (,`quasiquote ,g))))
    > (eval ff1)
      ((lambda (g) `(,g (,`quasiquote ,g)))
       `(lambda (g) `(,g (,`quasiquote ,g))))
    > (equal? ff1 (eval ff1))
      #t

Although these quines, especially ff, look simple, there is a lot going on behind the scenes of quasiquotation. To see this clearly, let us look at quines in MetaOCaml.

Version
The current version is October 2007

 

Quines in MetaOCaml

MetaOCaml is a superset of OCaml providing the type 'a code of code fragments (quoted expressions, so to speak) and two primitives for building them: typed and hygienic quasiquotation (written .<e>. where e is some expression) and unquotation (written .~e, where e must be an expression of code type). There is also an interpreter, the function Runcode.run : 'a code -> 'a.

The very type of Runcode.run tells that its fixpoint should have the recursive type:

    type rc = rc code
which can be easily defined in OCaml. The simplest non-trivial quine thus comes to mind:
    let rec q1 : rc = .<q1>.
Alas, it does not quite work: recursive bindings of such sort are not allowed (with a good reason, as we shall soon see). There is a workaround:
    let rec q2 : unit -> rc = fun () -> .<q2 ()>.
Let us try (the responses of the MetaOCaml toplevel are indented):
    let tq2 = q2 ();;
      val tq2 : rc = .<(* CSP q2 *) ()>. 
    let tq2r = Runcode.run tq2;;
      val tq2r : rc = .<(* CSP q2 *) ()>. 
    let tq2rr = Runcode.run tq2r;;
      val tq2rr : rc = .<(* CSP q2 *) ()>. 
It does look like the evaluations of tq2 reproduce the same term.

However, what is that CSP? It is a `cross-stage persistent value', yet another facility offered by MetaOCaml. If x is an identifier bound outside quotations (a.k.a. brackets .<...>.), it may appear as is inside brackets. Thereupon, the value bound to it is `lifted' into the brackets. For values like integers, floating-points, strings, etc. with the printable representation, this representation is inserted into the brackets:

    let x = "abc" in .<String.length x>.;;
      - : int code = .<Stdlib.String.length "abc">. 
For other values -- for example, functional values, which cannot be printed -- what is inserted into the brackets is the heap pointer to the value. When such code is printed, the pointer is schematically shown as (* CSP *). In the earlier example, if we try to compare tq2r and tq2rr:
    tq2r = tq2rr;;
      Exception: Invalid_argument "compare: functional value".
we get the exception from the comparison function: functional values cannot be compared. Those functional values in tq2r and tq2rr code are the closure produced by q2.

There is no problem interpreting code that contains such pointers to the interpreter heap: the interpreter follows the pointer and fetches the needed value. However, code with pointer-CSP is essentially not closed: it depends, implicitly, on the interpreter heap. Therefore, the code cannot be faithfully saved into a file or serialized and transmitted to another host. The CSP-relying quine tq2 is cheating. Even more so is q1, had it been accepted by OCaml. It would have created code with the pointer to itself -- which reminds of an old trick popularized by Olin Shivers, of writing recursive code without using recursion, relying on macros to build cyclic, infinite so to speak, code.

Let us turn away from cheating and towards proper quines like the earlier ff and ff1. Let us re-write them in MetaOCaml. First, the typing of self-application-like terms `(,f ',f) requires the recursive type:

    type af = (af -> rc) code
The straightforward transcription of ff into MetaOCaml is then:
    let qf1 =
       (fun (f:af) -> .<.~f .<.~f>.>.) 
      .<fun f -> .<.~f .<.~f>.>.>.;;
      val qf1 : rc code = .<
        (fun f_1 -> .< .~f_1 .< .~f_1  >.  >.) .< .~((* CSP f *))  >.>. 
Yet we see CSP again! Noticing that in .<.~f>. the adjacent brackets and escapes cancel each other, we obtain
    let qf2 = (fun (f:af) -> .<.~f f>.) .<fun f -> .<.~f f>.>.;;
      val qf2 : rc code = .<(fun f_2 -> .< .~f_2 f_2  >.) (* CSP f *)>. 
The identifier f, bound outside brackets, appears within brackets as it is. That CSP comes out should not be surprising then.

That lifting, or cross-stage persistence is involved in quines was not apparent in Scheme. It took MetaOCaml to bring it to straight to face. Looking at quines in other languages, we now notice they all rely on the ability to insert a quoted value (E[t], on our earlier notation) -- as is, quotes and such -- inside another quoted value. To put it another way, they all rely on the ability to add a level of quotation, to lift E[t] to E[E[t]] -- to persist a (quoted) value from one stage onto the latter stage.

The CSP appearing in qf1 and qf2, however, is different from the one in q2. The latter was a pointer to a functional value, which cannot be serialized or printed. In qf1, if we look closely, the CSP is a pointer to the closed code value .<fun f -> .<.~f f>.>., which could be printed, and hence inserted into the brackets. That is what the Scheme quine did. The current version of MetaOCaml does not lift code values, and creates a pointer-CSP instead. Nothing mandates such pointer-CSP however.

To demonstrate that the pointer-CSP is avoidable, we lift the code value by hand: by serializing it and inserting the resulting string.

    let qf3 = 
     (fun (f:af) -> .<.~f (Marshal.from_string .~(let x = Marshal.to_string f [] in .<x>.) 0)>.) 
     .<(fun f -> .<.~f (Marshal.from_string .~(let x = Marshal.to_string f [] in .<x>.) 0)>.)>.;;
    
      val qf3 : rc code = .<
       (fun f_3 -> .<.~f_3
             (Stdlib.Marshal.from_string
                .~(let x_4 = Stdlib.Marshal.to_string f_3 [] in .< x_4  >.) 0) 
           >.)
        (Stdlib.Marshal.from_string
           "\132\149\166\190\000...\000\200@@@"
           0)>.
      
    let qf3r = Runcode.run qf3;;
       val qf3r : rc = .<
        (fun f_3 -> .<.~f_3
             (Stdlib.Marshal.from_string
                .~(let x_4 = Stdlib.Marshal.to_string f_3 [] in .< x_4  >.) 0) 
           >.)
        (Stdlib.Marshal.from_string
           "\132\149\166\190\000..\000\200@@@"
           0)>.
    
    let open Format in
     let s1 = print_code str_formatter qf3; flush_str_formatter () in
     let s2 = print_code str_formatter qf3r; flush_str_formatter () in
     s1 = s2;;
        - : bool = true
The quine contains a rather long binary string, abbreviated in the above listing. We confirmed that the printed representation of qf3 and of the result of Runcode.run qf3 are identical. Hence qf3 is a quine.
Version
The current version is December 2020
References
MetaOCaml -- an OCaml dialect for multi-stage programming

Olin Shivers. Stylish Lisp programming techniques. Technique n: passing circular list structures to the evaluator
<http://www.ccs.neu.edu/home/shivers/newstyle.html>

 

The simplest quine

We now demonstrate the simplest non-degenerate quine -- the simplest quine different from self-evaluating data like integers, strings, and other printable values.

Recall a quine is a fixpoint of an interpreter, and hence is determined by the interpreter. What is the interpreter, that is, what is the language to write quines in -- is our choice. Let us generalize the Runcode.run interpreter of MetaOCaml to run1:

    type aff = (aff -> aff) code
    let run1 (c:aff) = Runcode.run c c
The run1 interpreter thus expects the code to be a function that takes its own code as the argument (which it is free to ignore). Looking carefully at the type (af -> af) code gives the intimation of what such code may look like:
    let qq = .<fun f -> f>.;;
       val qq : ('a -> 'a) code = .<fun f_5 -> f_5>. 
It is indeed the identity. It is also undoubtedly a quine, with respect to run1:
    let qqr = run1 qq
      val qqr : aff = .<fun f_5 -> f_5>. 
    let qqrr = run1 qqr;;
      val qqrr : aff = .<fun f_5 -> f_5>. 

 

Epilogue

The article was originally written in October 2007. MetaOCaml at that time used environment classifiers, which posed many problems and prevented the presentation of the complete MetaOCaml quine. Environment classifiers have been eliminated since. The now re-written MetaOCaml part succeeds in showing a quine, with no CSPs.

An astute reader must have realized that quines being a fixpoint of the interpreter is no accident: it is the consequence of the fact that a quine is sfix E[id], where id is the identity function and sfix is the staged fixpoint combinator, arising from Kleene's second recursion theorem. Unrolling sfix E[id] using the expression for sfix given by the proof of Kleene theorem, we obtain that a quine is

    (\f -> a f (b f)) E[\f -> a f (b f)]
where a is the combinator that builds an application (in MetaOCaml, it is fun x y -> .<.~x .~y>.) and b is the lifting combinator, converting E[t] to E[E[t]]. Scheme and MetaOCaml quines presented in this article are the manifestations of this general quine expression.