From SRS0=XK3e=NL=google.com=posting-system@bounce2.pobox.com Sun Oct 24 23:19:46 2004 Date: Sun, 24 Oct 2004 16:19:36 -0700 From: oleg-at-pobox.com Newsgroups: comp.lang.scheme Subject: Zipper in scheme Message-ID: <7eb8ac3e.0410241519.3f2b3e50@posting.google.com> Status: OR Zipper is a very handy data structure that lets us replace an item deep in a complex data structure, e.g., a tree or a term, without any mutation. The resulting data structure will share as much of its components with the old structure as possible [see addendum]. The old data structure is still available (which can be handy if we wish to 'undo' the operation later on). Zipper is essentially an `updateable' and yet pure functional cursor into a data structure. Useful references: http://www.nist.gov/dads/HTML/zipper.html http://citeseer.ist.psu.edu/hinze01web.html Zipper is a _delimited continuation_ reified as a data structure. Somehow that idea is not commonly discussed in the zipper literature. Because Scheme has first-class and delimited continuations, we can derive and use zipper far more easily. Given below is a derivation of zipper and an example of its use: swapping out of two branches of a two trees. The latter is a typical cross-over operation in genetic programming. noelwelsh@gmail.com (Noel Welsh) wrote in message news:... > I would like to do the crossover in a purely functional manner. The > algorithm I envisage is: > - Count the number of nodes for each of the two trees > - Choose an random integer between 0 ... (number of nodes - 1) > This is the node where crossover will take place. As pointed out earlier, we don't need counting to select a random node from a tree. After we selected the node, we can zip down to that node in the tree using the eq? test. In the following however, we skip the random selection for simplicity and thus we shall be selecting nodes by their index in the depth-first order. To derive zipper, we first write the familiar depth-first traversal routine: Welcome to Scheme 48 1.1 > ,open srfi-9 > ,open escapes signals > ,load /usr/local/lib/scheme48/misc/shift-reset.scm ; deterministic, left-to-right map (define (map* f l) (if (null? l) l (cons (f (car l)) (map* f (cdr l))))) (define (depth-first handle tree) (cond ((null? tree) tree) ((handle tree) => (lambda (new-tree) new-tree)) ; the node was not handled -- descend ((not (pair? tree)) tree) ; an atom (else (cons (car tree) ; node name (map* (lambda (kid) (depth-first handle kid)) (cdr tree)))))) The function handle, the first-argument of depth-first, receives a node and should yield either a node or #f. In the first case, the return node replaces the existing node in the result tree. If the handle returned #f, it has declined to handle that node, so we keep that node and descend into it, if possible. To see how this works, we define two sample trees and print out their nodes: (define tree1 '(a (b) (c (d 1 2)) e)) (define tree2 '(z (u) (v (w 10 12)) y)) (depth-first (lambda (node) (display node) (newline) #f) tree1) ==> prints (a (b) (c (d 1 2)) e) (b) (c (d 1 2)) (d 1 2) 1 2 e ==> yields '(a (b) (c (d 1 2)) e) We can now define the zipper data structure: (define-record-type zzipper (zipper curr-node k) zipper? (curr-node z-curr-node) (k z-k)) It contains two fields: the current node of a tree, and the continuation. The continuation should receive a node or #f. In the former case, the received node will replace the existing node. In the latter case, we keep the existing node and continue the traversal. The continuation returns either a new zipper, or a tree (if the traversal is finished). One can see that zipper is in a sense an 'inverse' of the function handle. (define (zip-tree tree) (reset (depth-first (lambda (tree) (shift f (zipper tree f))) tree))) As promised, zipper is indeed a manifestation of a delimited continuation. We should point out that both the zipper record and the constructor function zip-tree are _generic_. They by themselves depend neither on the representation of the tree nor on the traversal strategy. All the information about the tree data structure and its traversal is encapsulated in one single function depth-first. We can switch from depth-first to breadth-first strategy or from a nested list to a nested vector realization of trees just by changing depth-first. Neither zipper, nor zip-tree, nor any code that uses zipper (see below) will require any modifications. This property of our zipper is in a marked contrast with Gerard Huet's derivation of zipper. In Gerard Huet's formulation, zipper does depend on the concrete realization of the data type: zipper is derived (pun intended) from the data type. Different data types (and different realizations of an abstract data type) will have different corresponding zipper structures. In our formulation, zipper is a _generic_ derivation (pun intended) on the traversal function. Zipper is a derivative of the traversal function -- mechanical derivative at that. So, shift/reset can be considered traversal function derivative operators. We can now print out the tree in a different way: (define (print-tree tree) (do ((cursor (zip-tree tree) ((z-k cursor) #f))) ((not (zipper? cursor))) (display (z-curr-node cursor)) (newline))) we use zipper, which is a cursor, to examine all of the tree, node by node. In a sense, we have inverted the operation of depth-first. (print-tree tree1) ; prints as before (print-tree tree2) (z (u) (v (w 10 12)) y) (u) (v (w 10 12)) (w 10 12) 10 12 y We introduce a few helpful functions (define (zip-all-the-way-up zipper) (if (zipper? zipper) (zip-all-the-way-up ((z-k zipper) (z-curr-node zipper))) zipper)) (define (locate-nth-node n tree) (do ((i 0 (+ 1 i)) (cursor (zip-tree tree) ((z-k cursor) #f))) ((and (= i n) (if (zipper? cursor) #t (error "too few nodes"))) cursor) )) And we are ready for some action: ; replace the 3-d node of tree1 with 'xxx (let ((desired-node (locate-nth-node 3 tree1))) (display "Replacing the node: ") (display (z-curr-node desired-node)) (newline) (zip-all-the-way-up ((z-k desired-node) 'xxx))) ==> prints Replacing the node: (d 1 2) ==> yieds '(a (b) (c xxx) e) It did replace it, didn't it? ; cross-over of the 3d node of tree1 and 1st node of tree2 (let* ((desired-node1 (locate-nth-node 3 tree1)) (_ (begin (display "Cross-over the node1: ") (display (z-curr-node desired-node1)) (newline))) (desired-node2 (locate-nth-node 1 tree2)) (_ (begin (display "Cross-over the node2: ") (display (z-curr-node desired-node2)) (newline))) (new-tree1 (zip-all-the-way-up ((z-k desired-node1) (z-curr-node desired-node2)))) (new-tree2 (zip-all-the-way-up ((z-k desired-node2) (z-curr-node desired-node1)))) ) (display "new tree1: ") (display new-tree1) (newline) (display "new tree2: ") (display new-tree2) (newline) ) ==> prints Cross-over the node1: (d 1 2) Cross-over the node2: (u) new tree1: (a (b) (c (u)) e) new tree2: (z (d 1 2) (v (w 10 12)) y) Well, it seems to work... If we swap the 3d node of tree1 and the 5th node of tree2, we get Cross-over the node1: (d 1 2) Cross-over the node2: 12 new tree1: (a (b) (c 12) e) new tree2: (z (u) (v (w 10 (d 1 2))) y) To conclude, delimited continuations are quite useful. They can be emulated in any R5RS Scheme system; yet it is better for performance if they are supported natively. Scheme48 does support delimited continuations natively (Martin Gasbichler and Michael Sperber, ICFP 2002). If your favorite Scheme system does not offer them by default, please complain to the implementors. It doesn't matter which particular delimited continuation operator (shift, control, shift0, splitter, cupto, etc) is supported -- all of them are equally expressible: Chung-chieh Shan, Scheme2004 workshop http://www.eecs.harvard.edu/~ccshan/recur/ Addendum, June 7, 2006 [inspired by a question from Andrew Wilcox] To be more precise, the zipper preserves sharing as much as the underlying enumerator does. The following is the maximal sharing preserving enumerator. Those two functions should replace the ones in the article. ; deterministic, left-to-right map ; It preserves sharing as much as possible: that is, if given the pair ; l == (cons h t), (and (eq? h (f h)) (eq? t (map* f t))) holds, then ; (eq? (map* f l) l) holds as well. (define (map* f l) (if (null? l) l (let ((h (car l)) (t (cdr l))) (let ((h1 (f h)) (t1 (map* f t))) (if (and (eq? h1 h) (eq? t1 t)) l (cons h1 t1)))))) (define (depth-first handle tree) (cond ((null? tree) tree) ((handle tree) => (lambda (new-tree) new-tree)) ; the node was not handled -- descend ((not (pair? tree)) tree) ; an atom (else (let ((kids1 (map* (lambda (kid) (depth-first handle kid)) (cdr tree)))) (if (eq? kids1 (cdr tree)) tree (cons (car tree) ; node name kids1)))))) To test that the new depth-first indeed preserves sharing, we evaluate (eq? tree1 (depth-first (lambda (node) (display node) (newline) #f) tree1)) which, after printing all nodes in depth-first order, gives the result #t. The tree returned by depth-first in this case is indeed the original tree as it is. The zipper code needs no changes, and it works as it was, with the same results. To test the sharing preservation, we first produce a tree by replacing the 6th node (which is y) in tree2: (define tree2* (let ((desired-node (locate-nth-node 6 tree2))) (display "Replacing the node: ") (display (z-curr-node desired-node)) (newline) (zip-all-the-way-up ((z-k desired-node) 'newy)))) here's the result: (z (u) (v (w 10 12)) newy) Now, we write a function that takes two trees, traverses them in lockstep and prints out the nodes and if they are shared: (define (tree-compare-sharing t1 t2) (do ((cursor1 (zip-tree t1) ((z-k cursor1) #f)) (cursor2 (zip-tree t2) ((z-k cursor2) #f))) ((cond ((and (zipper? cursor1) (zipper? cursor2)) #f) ((zipper? cursor1) (display "t2 finished early") #t) ((zipper? cursor2) (display "t1 finished early") #t) (else #t))) (let ((n1 (z-curr-node cursor1)) (n2 (z-curr-node cursor2))) (cond ((eq? n1 n2) (display "shared node: ") (display n1)) (else (display "t1 node: ") (display n1) (newline) (display "t2 node: ") (display n2))) (newline)))) (tree-compare-sharing tree2 tree2*) ===> t1 node: (z (u) (v (w 10 12)) y) t2 node: (z (u) (v (w 10 12)) newy) shared node: (u) shared node: (v (w 10 12)) shared node: (w 10 12) shared node: 10 shared node: 12 t1 node: y t2 node: newy